ED 170 351 


AUTHOR 
TITLE : 


INSTITOTION 

» SPONS AGENCY 
PUB DATE 

. NOTE ; 


EDRS PRICE’ 
DESCRIPTORS 


- 


\IDENTIFIERS | 


ABSTRACT | 


“ 


DOCUMENT RESUHE 
i 


‘TH 008 766 


Spencer, Mary L.; And Others 

Measures of Non-Academic Functional Literacy in 
Children. An Evaluation of Available Instruments. 
Pacific Training and Technical Assistance Corp., 
Berkeley, Calif. 

System Development Corp., Santa Monica, Calif. 

13 Oct 75 

96p.3; For related document, see TM 008 749 


MFO1/PCOS Plus Postage. 
Basic Skills; Compensatory Education Programs; 


¢ 


’ *Evaluation Criteria; *Punctional Illiteracy; 


*Punctional Reading; Intermediate Grades; Junior High 
Schools; “Literacy; Program Effectiveness; *Reading 
Tests; *Test Reviews; *Test Selection tgs! 
Adult Performance Level; Basic Skills Reading Mastery 
Test; Elementary Secondary Education .Act Title I; : 
Fundamental Achievement Series; National Assessment 
of Educational Progress; New York State ‘Basic 
Competency Test in Reading; Reading Everyday 
Activities in Life; Test of Adult Functional 
Competency ' 


As part of the d@velopment of a functional literacy 


test for fourth through eighth grade children in Title I compensatory 
education programs, this report enumerates a set of criteria for’ 
selecting appropriate tests. The criteria are grouped into six 
categories: (1) test background; (2) psychometric quality; (3) 

exa minee appropriateness; (4) normative standards; (5) administrative 
usability; and (6) interpretation. The six tests reviewed as 
potential instruments are the Adult Performance Level Test, Basic 
Reading Skills Mastery Test, Fundamental Achievement Series, National 
Assessment of Educational Progress, New York State Basic Competency 
Test, and Reading/Everyday Activities in Lifé. None of these tests 


meets all the criteria. 
developing a new test, o 


lternative solutions proposed include 
constructing a test using parts of existing 


instruments. (MH) 
2 ny 
\ 
@ 
CH REEE EEE ER EEE EERE HR RR HR IR HR ORE EE RRR RRR 
Reproductions supplied by EDRS are the best that can be made * 
* . from the original document. * 


— SREAKKKAKKEEKKEKKERE KEKE RKKKREKEKKKERKAKEKESREKKEKERKEKEKHEEEEKKEKEEEKEEKKEKEKEKEK 
4 


P) 
= 


£D170351 


a 


MEASURES OF NON-ACADEMIC 
FUNCTIONAL LITERACY 
IN CHILDREN 


- AN EVALUATION OF 
AVAILABLE INSTRUMENTS 


‘U.S. GEPARTMENT OF HEALTH. 
EDUCATION & WELFARE 
NATIONAL INSTITUTE OF 

EOUCATION 


THIS DOCUMENT HAS BEEN REPRO- 
OUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIGIN. 
ATING IT POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 


M008 766 


Mary L. Spencer 


“PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN’GRANTED BY 


hay praesr.— 


By . 


TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) AND 
USERS OF THE ERIC SYSTEM.” 


Nicolas Fedan and Bobby R. Offutt 


Submitted to: 


Pr) 


Submitted by: 


ent Corporation 
venue _ : 
alifornia 90406 


| 


: ae 
October 13, a ? 


, ae) 


_———_—_ 


nae ea een ines em 


o hee DiS 
| Pa 
ets Mise 
te er ‘ge 
% 


it 


Pic: Neer nn ores 


a oe 
: Several test authors, publishers, and agency sources 
of tests were extremely helpful to this review. Their : 
efforts to'relay test materials and information in a highly 
expeditious ais. Pepin appreciated, particularly in cases 


. 3 2 . - ; \ ¥ ww se } 


where materials pdnding copyright were entrusted to the 
rae aue to the following persons:. 


Dr. Norvell Northcutt 
Adult Performance Level Test 


reviewers. Menti 


ee Dr. Kenneth Majer 
8 ; Basic Skills Reading. Mastery Tests 


Dr. Marilyn Lichtman 
Reading/Everyday Activities in.Life 


’ Jane Algozine 
. New York State Basic Competency Test = 


‘Dr. Harold Wilson 
National Assessment of Educational Progress 


Psychological Corporation 
- Fundamental Achievement Series 


C) 


‘MEASURES OF NON-ACADEMIC FUNCTIONAL LITERACY IN CHILDREN 


AN EVALUATION OF AVAILABLE INSTRUMENTS FOR INCLUSION y 


. 


COMPENSATORY EDUCATION 


TABLE OF CONTENTS - 


= 


INTRODUCTION 
Requirements for a Measure of 
Functional Literacy in Children 
‘TEST REVIEW CRITERIA 
» Outline . 
Test Background 
' Psychometric Quality 
” - Appropriateness 
Normative Standards 
Administration 


. Interpretation 


‘TEST EVALUATION | 
| Adult Performance Level Test 
Basic Reading, Skills Mastery Test 
Fundamental Achievement Series 
National Assessment of Educational Progress 
New York State Basic Competency Tests 
Reading/Everyday activities in Life 


SUMMARY AND IMPLICATIONS 


| 


} 


REFERENCES 


_IN THE STUDY OF SUSTAINING EFFECTS OF ESEA TITLE I 


: 


ee ee 
Pi tee Snir) 


ITIP 


anuenz10F 19d 


_ MEASURES OF NON-ACADEMIC FUNCTIONAL LITERACY. IN CHILDREN 


AN EVALUATION OF AVAILABLE INSTRUMENTS FOR INCLUSION 
< : 
IN THE STUDY OF SUSTAINING EFFECTS OF ESEA TITLE i 


t 


. , COMPENSATORY EDUCATION 


Areva 


a 
- 
ees = 


INTRODUCTION 


The Study of the Sustaining Effects of Title I Con- 


Arrenoay 


pensatory ‘Education on Basic Skills, to b> conducted by 


‘ 


System Development Corporation, will describe and evaluate 


4! 


the ee for economically and educationally 
disadvantaged children. The assessment of student perfor- ; 
mance has been identified as one of the major activities 
of the study. Toward this end, a norm-referenced evaluation | 

. model will be implemented via the administration of a f 


standardized achievement test to 160,000 children in the 
Fall and Spring of three consecutive school years, ‘beginning 
in the Fall of 1976. Due to increasing concern regarding 
the use of standardized achievement tests with disadvan- 
taged and minority students, the United States Office of 
Education has directed that measures of more life-like, 
non-academic, or functional instances of literacy in 
children be poeiizted for selection as an adjunct 
index of reading and mathematics ability. 
In service of the need to’ supplement indbeee of reading 


and mathematics ability Cerived from standardized achieve- 


. 


CT 


ment tests, Pacific Consultants has -conducted an extensive 
search and ,review of relevant literature from educational 
and psychological journals, research and information clear- 


inghouses, government projects, and individual investiga- 


‘tors. This process has resulted in the production of a de- 
: ; 


finition of functional literacy in schoolchildren, the de- 


velopment of criteria for the evaluation and seléction of 


a measure of functional literacy in schoolchildren, and the 
identification of a set of candidate instruments. 

While the definition and criteria were delineated in a 
previous report (August 29, 1975), the purpose of the 
present report is to set forth a refined set Of criteria 
for test welection, ana to review and evaluate the 
identified measures. Subsequently presented are a sum- 
mary of findings, and a discussion of thete “tipitentians 
for the assessment of functional literacy in the Title 


I Study of Sustaining Effects. 


oO 


Re edeenenee for a Measure of Functional Literacy in 
the ESCA Title I Study of Sustaining Effects 


A general description of the characteristics 


‘desired in the test of functional literacy was stated 


in SDC's Statement of Work for the Title I evaluation 
and amplified in later discussions between SDC, Paci- 
fic Consultants, the United States Office of Education, 
and the Functional iideenay Baned, These shavabter- 


istics can be briefly represented as follows: 


6 7 : 


viewed as the reading and computational skills 


ministration by non-expert school ‘personnel 


scoring. 


i 


-—-—=—2enm 


First, the instrument must clearly measure the 
‘operational definition of functional litéracy 
‘ y 


that was developed for the Study of, Sustaining 


Effects. Accordingly, functional literacy is | ° 


siaened by children as they deal with the con- 
temporary non-school related world. It must 

be ws independent test in the sense that it 

was specifically designed to measure functional 
literacy rather than being the reading or com- 


z 4 


putational portion of an achievement test battery‘ 


Second, the level, range, and content of the 
test must be appropriate for elementary school 
children in grades 4 to 8, including children 


from disadvantaged backgrounds. : 


Third, costs of the test should be in the nor- 


mal range of costs for comparable tests. 


Fourth, the test must be capable of group ad- 


employing uniform procedures across the country. 


Fifth, the test should be amenable to machine 


Sixth, if a norm referenced test is used, the-norms 
pertaining to the population of the. study should 

\ ‘be available. If a criterion-referenced test is 

| “used, the criteria on which the test is developed 
" should relate iia valid way to compensatory ed- i 


ucation objectives. 


a! 


Seventh, evidence of reliability should be avail- 


able, 


Eighth, the test must validly measure the beha- 
\ vior addressed by the operational definition of 


functional literacy. $ 


Ninth, the test must reflect concerns for the 


propriety of content for a pupil population which 


is highly diverse in basic skills and ethnic 


ackground. 


‘TEST REVIEW CRITERIA ° i ot 
Characteristics of the test, the nature of the ex- 
aminees, and the purpose of testing are important factors 
in selecting a test of fydeblonal liveries fox use in tne 
ESEA Title I Study of Sustaining Effects. The ieivecen 
for test selection presented here are based very largely 
on the general guidelines provided by the American Psycho- 
logical Association's Standards, for Educational and 


Psychological Tests (1974), and the criteria employed in. 


the test evaluations at the.Center for the Study of Evalu-. 


< ation (CSE), as presented in the document CSE Elementary 


School Test Evaluations (1970), authored by Ralph Hoepfner 


and others. Additional criteria were suggested by Pacific 
Consultants': previous review sf tending and literacy hughes 
for the peu cbaiee Evaluation, (1974),and the recent 
examination of tests of adult functional literacy pavtoened 
t the Northwest Regional Education Laboratery under the 
ai ection of Dean Nafziger (1975). 

: \\The eetvevta suggested by the sources indicated above 
provided a reasonably complete compilation of factors 
relevant to test selection, but were not concerned specifi- 

cally with the measurement of functional literacy in grades 
YW to 8 for the purpose of program evaluation. A number of 


geheral recommendations were not suitable in meeting the 


. : special requirements of the Title I Evaluation, and were 


J 


- seme 


“ . 
therefore modified as necessary. The proposed criteria 


are organized according to six general areas: 1) test 


. background; 2) psychometric quality; 3) examinee appro- 


2 : \ 

priateness; 4) normative standards;- 5) administrative 
usability; and 6) interpretation. These are in approximate 
correspondence with the areas identified in the CSE test 


evaluation system. 


aTnoVv 


eo 
- neem rT wad 
- ° 


x a o & 

4 -. 

~ ae 

tg rg 
; : 2 hy. . ; ; 8. 

bs he OUTLINE AND WEIGHTING OF TEST REVIEW-CRITERIA - -° : + 46° 
v. ‘ t rs . i Nes a . : 5 x 
a et ; os ae? E. 
. Test Background ’ . Ties c 

' x are : : - 

Criteria in this section -consider whéther the * . 
purpose for which the test was constructed - : See se =! 


: is, clearly stated, and whether the manner of ° 
test construction is compatible with its 
purpose. * 


awrewmre 


=. 
"e@me = 


Fe 
Reve meme: 


an : Psychometric Quality 
1. Validity 


as: E 


a clear definition of what the test meaSures; ~ | 

criteria on the behavioral, linguistic, socio- 

economic function, program referencing, and . mag 
RE Ss ge task-reference of the test items. Empirical es a ae 
* validity includes criteria’on the empirical _ Fa may 
evidence relating test scores to other variables 
or real-life outcomes. ‘Construct validity in- °°. °° .°* 
cludes criteria on the theoretical basis of the - oT Re? vad 
.functional literacy ReTSeEes ; A a 


- 


a 


ae Content validity includes criteria encompassing ate ge 


t 


as Reliability * 

ee Criteria are included for four measures of 
> : reliability: comparability, or alternate- 

form equivalence; stability, or test-retest 
correlations; internal consistency, or item 
intercorrelations; and standard error of ~ 
measurement, or how much a score is likely 

Oo vary. 


3. Test-Item Structure 


* 


Included are criteria on the relevance of item 
construction procedures, item selection pre=\ 
cedures, and item difficulty. : 


Appropriateness ’ 
l 1. Instruction 

—— * ¥ .- 
Criteria are included for insuring that instruc- F 

: tions are clear, that they include an unambiguous . 
; explanation of the purpose of the*test, that —, 74 
they are comprehensive, that sample item(s) 
are Shes BER and that they be presented orally. 


Items : 
_———— 

Criteria are included in this section which 
insure that items have propriety, and be motiva- 


ting to examinees. 
; \ ‘ 


* 


’ Format and Procedure 


criteria here address physical quality, lay out, 
timing, response mode, and complexity of the 


test. 
a 


‘ formative Standards 


/ 


af 


"Interpretation 


»/ 
/ 
/ 


Criteria in this section address availability 
of norms, quality and representativeness of 
the normative sample, reporting categories, and 


desirable types of item statistics.. 
4 oe 
Administration ; : 
a 


This area addresses desireable characteristics 
of the test in terms of examiner expertise, 
optimal length of testing time, scoring, test 
setting, materials, and cost. 


'e 


Criteria of interpretation address the quality 
and organization of manuals, clarity of score 
interpretation, and the implication of the test 


score. 


Test Backgrouns 


1 


The purpose of the test should be explicitly 


* test. 


stated. For instance, "...to measure the examin- 


ee's performance on tasks or activities held to 


be significant to a student's life outside- 
of school." Examples ‘should be mentioned 
(e.g.,"...to read and follow. the directions 


on a medicine bottle"). 


The purpose statement should be clear to : oe ot 


those individuals likely to administer the 
The construction of the test should follow 
closely the purpose for which the test was 
built. Thus, a diagnostic test (such as 


the one envisioned for the present study) 


' should state how the test's purpose trans~ 


L 


“ visioned to be.a selection or certification 


- Diagnostic tests should be designed to 


dnstrumént, the range of item difficulties 


lates or agrees with the scope of tasks and 


operations ta be covered. Such scope should 


be limited, well defined, and detailed. 


yield scores on the separate components t 


Mpc ‘ 7 } 
‘of interest. Since the test is not en- 


wt OS 


n 


and ait power with which” the test discriminates 


“among examinees is less ‘important. a 


y ATNITW 


.’ 
acne ee ON 
. . 


—_ one oe + 


possible variation from this criteria would 


4 


- "> ayise if the test is envisioned to discri- 


Gan BS , minate among age groups. In This Gage, the 
. | 4 test would also have i Gi nant Picators purpose ‘ 
\ ¥ which requires different item properties) . 

4. The test should indicate whether differences 
among minority groups were considered during | 
test construction. If such tin aldeentian aia 
exercised, items oule, have been sampled which 
depict the actual behaviors of students in these 

groups in extracurricular life activities. 
ee 
: Peychometric Quality 
| a Criteria addressed in this secon pertain to the’ . 


a 


bs validity, reliability, comparability of the test scores, 
and the quality of normative standards. 


« . ¥ 
oe . 1. Validity -- The criteria in this area concern the 
Pa be, OR ae nature of what is measured by the test, It is 


eet a i. A eS ee epee» 


me MU Ste _» ambiguously a measure of functional Literacy, if 


Factors contributing to the credibility of a test 
+ as measuring functional literacy are considered 


in- terms of Seerent empirical, and construct 


ee a validity. | 7 Pn 


. 


its role in the Title I Evaluation is to be served. 


o 


most important that the test be clearly ‘and un- 5 


=" 
Peta at eine) 


a os - ing. and suapatationad Caekey Confidence . 


. ae Content: Validity - -- It is highly seams 

2 , > able that, the test be zepvacentacivs of i 

| . defineable population of items and perform- 

\ ances with specific reference to the domain . 
of functional literacy. The bases of defin- 

- ition and the procedures of test construc- 
tion contribute to content validity in terms” 
of the criteria outlined below. 

1) ‘Definition -- The test should be speci- 
fically designed as a test of functional | 
literacy. Disagreement on the validity © 
of content will surely arise if the test 

»was originally designed for some other “ 
go “ag | | purpose, and if no explicit basis 


exists for judging the relevance of 
7 items. tag 
2) Material Domain -- The stimulus mater- 


ials should be representative of those 


et yee 


commonly encountered in xeal=ite read- 


ee 


es ae 


mae 


ein the representativeness of materials” 


6 


tees af dei <4 -would be increased if.a. population ee 


such materials were defined, the compo- vy #4 


- dee 


» ' 
ae es z as 
Zes ne ~~ a on P 
OHS ERR rte ee egies ohm nean map: “ ane ee ee 
r= - 


= ee renee: entre he * 


ae af sition. of the population was described 

 * in terms of-types or’ characteristics of 

| materials, and formed part of the defin- 

i“ > ath ition of functional literacy used as the ag ‘* 

‘ basis of test development. Pes As mceeera 9 
f 


. . e . 
nis aa 
: . oo a . ’ 
¢ . A ; 5.0 : re 
Z 10° : , | 
wo F i * 3 a eo ik <e , ~ Bee 4, 
: a . , : “° : Ve : ge ah gt ao 
awe one , i so * e on 
eS ; at m % ; yi ot ‘ 3 
why hay Od * 5 f. ” ’ . ye Boye ce : 


i Tc i ly — 


4 ° 


4) 


- representative of the symbolic content commonly 


in relation to the domains of materials and 


_ tasks. 


_ tasks should be répresentative of the socio- 
.” @conomic functions commonly~ encountered in - 


“s. veal-life reading and computational tasks. 


_@Wonly required in real-life reading and computa- ~_ 


‘of functional literacy beyond the material and - 


‘ Socioeconomic Domain -- The materials and. 


ee eo ete 16 


Behavior Domain -- The performance required if 
in the items should be as close an approxi- 
mation as possible of the Fasks and aren ever 


tional performances of children between the ages. 


{ 
| 
= perce 


of 9 to 14. "Explicit classification and/or 


‘ 
, 
"285 


description of a domain of functional literacy sm 


te 
me eS 


behaviors is saitrestis as part of the defini- 
tion used as a basis for test development. c 
Symbolic Domain -- The language and other 

symbolic representations which form the commu- 


nicative component of the materials diced na 


endountered in real-life reading and computation- 


al tasks. - Specification of the symbolic con- 


tent in linguistic and mathematical terms can as 


further strengthen and clarify the definition 


behavior specifications usually considered. ’ 
Such specifications could be particularly helpful a 


in defining levels or ranges of competence 


A classification or description of sotioeco- 
nomic functions,and the benefits or values 
of performance, should be partiof the definition 
of funggional literacy that is used as a basis 
of test development. Such’a classification or 
appraisal system wougd help insure that func- 
tionally significant, rather than trivial per- 
formances are represented. Peas 
6) Program Objective -- The materials and tasks a 
of the functional literacy test should not be 5 


: t 
referenced: to specific program objectives. |! 


Program-referencing would amount to prejudging 


: +3 

the result of the evaluation'in relation to ; a 

functional literacy, in that it would inevitably . ' 

ae . bias the evaluation in favor of program goals cf 


and those programs which emphasized the defined 


objectives. The test is intended to provide an 


objective criterion by means of which the effect- 
* iveness of various programs can,be judged in 


en the area of functional literacy. 


7) Criterial Objectives -- The definition of : 


' functional literacy should be supplemented wi 
and operationalized by the specification of 


a set of criterial tasks referenced directly 
a to the characteristics of materials, behavior, 


symbolic content, and functions employed in 


aife =§ ‘ ; : , . 


oe ee - 1217. ; 


> Z aes. 
Ke ne oe eee 


i 
| 
7 


> 


the functional literacy’ definition. Such 

objectives would provide an important link 
between definition and items. Such objec- 
tives might be used in téen couwtruction 


and selection, or as a basis for empirical 


validation of items. 


Empirical Validity -- It is desireable that the 
test have been used in previous studies, thus 
providing empirical evidence relating the test 
scores meaningfully to other variables. Areas 
of concern. in relation to empirical VAESGe ey are 


t 


outlined® Keiee. e . 
1} Concurrent Relations - It is advantageous 
but nat Seaankiai that the test has been 
correlated. in previous studies with a wide 
variety of other measures. faken at the same ~- 
_time. The ren bakeaiie dukisty of studies, the 
number of variables, and the diversity of 
variables all contribute to the evidence bear- 
ing on the meaning of a given literacy score. | 
2) Predictive Relations - It is advantageous at, 
but not essential that the test be ‘ . 
correlated with measures taken at some later 


time. * The number, and quality of studies as 


contribute to the .evidence bearing on the 
, 6 ree ee ee 


well as the number and diversity of variables 7 | 


3) 


4) 


question of what consequences flow from having, 


attained a given literacy score. 
Causality - It is advantageous, though not 
Tr 


essential, that studies have been performed 


. which relate the functional literacy test to 


important psychological, educational, or 


socioeconomic independent variables, Such 


evidence should be of assistance in the analy- 


sis and interpretation of the findings in the 
Title i Evaluation. 


‘Nature of Relations - Pantetcat relationships 


found in the available literature should be 
reasonably interpretable in terms of prevail- 
ing educational, psychological, and socioeco- 


nomic theory. The measure of Hoperacnee 


"literacy should, relate sensibly. to variables 


which can be considered to reflect components 
of functional literacy, and to variables which 
are thought to be independent of functional. 

literacy. ‘Factor. analytic studies, if any 


\ 
are available, should indicate that the © 


- measure of funttional literacy is factori- 


ally complex The nature of one particular 
relationship is’ especially important. The 


subtests should not correlate’ too highly ween 


standardized tests of reading ability or bite . 


computational skills. Very high correlations 


. 


x 4 


Pe a ee ae 


at s i pia dasnsrce tales 
ecasee,  Aete  OE 


ee ei 
* : . 


» : . 
* ‘ ’ , . f .* * 
. ' 
1 . i . r ’ 
\ 5 . " *:; 
bo hy , u 
. Ba : . 
, a i 
2 4 Mi . id 
. ae oe: ~ 


sine 


". 5): 


of this sort would indicate that the test - 
did not adequately reptesent the separate 
skills required ina ene literacy 


measure. 

Sensitivity - It is advantageous that the 
magnitude of effects observed was substantial 
when the test was used as a dependent variable 
in experiments or evaluations. That is, the 
test should be sensitive to the effects of 
appropriate iiaeucaddent variables, so that 
there is some desirance that appropriate 
effects will be revealed in the Title I Evalu- 


ation as well. 


c. Construct Validity -- Criteria in this area 


have to do with the theoretical basis of the func- 


tional literacy concept. They are of lesser im- 


portance in ‘Judging validity than are content ma, 


empirical criteria, given the practical concerns _ 


of the Title z Evaluation . But, . they are 


valuable characteristics nonetheless. 


as 


Process Constructs - The conceptualization, 4 
development, and empirical validation of the 


test should be grounded on relevant psycho-: 


logical, linguistic, educational theory in = 


the area of reading and computation. © Parti- 


cularly important in this respect is the 


a oo ‘ oh 
a4 . ‘ . ; . 2s 
a2 -, . e ~~" ’ . ’ + 
, . bi * 
as Ww a, hr . GET 


saesea OU Rouwmxds 387 SHPO 


$ 
‘. 


b3 eS cheese ee 
*$ 


i 


i s 
. 
* : 


Dstt, agate age © 
* a FE: “a Nia athens 


3 


availability of a task-skills analysis which — : 
would define the components of functional 
literacy, indicate oe welaciens Ne 
among components, and the relationship of. ~ ; 
performance to basic cognitive information pas 
processing operations. Such a theoretical 
foundation is useful in generating hypo- 
' theses and ae abil EEROLER:: 
2) Acquisition Constructs - “the sonseobinlisan 
tion, development, and empirical validation 
of the test should be SERunaee in relevant 
psychological, linguistic, and educational 
theory in the areas of instruction and cog- 
nitive and language. development. Such form= 
ulations would provide a basis for tying 
changes.in functional literacy to specific 
educational practices, a related devel- “<r 
opmental changes. ; er 
3) Socioeconomic Constructs - The conceptual- | 
ization, developmént and validation of the 
test should be grounded on relevant social 
and economic theory to provide a basis for. 
hypothesis and interpretations, of findings 
Pa concerning relevant socioeconomic variables, 
and the function and benefits of literacy. 
. Reliabitity . -- The question here is how well does the ; _ 
ese meseen what he does measure? ’ 
ere ar ae ee 


. 
. 4 ee 

' . , 
a” ' ; ; “ a 


” 


+ eaten mee at we ss se ee etme: me 


c. 


‘ above .70 for intervals up to one year. 


Comparability - If alternative forms are avail- 
able, they should be based on Savatiel dtens 
with- comparable item statistics. The forms 
should correlate .80 or above at evade genes 
level in the 4 to 8 grade range. Although seldom 
provided in the early stages of test development: 
this is the preferred measure of reliability. 

In practice, two forms would be considered com- 
parable (equivalent) if, 1) they include the 
same number and kind of items; 2) standard devi- 
ations in the two forms are not significantly 
different; and 3) means obteisea’ with the two 
forme are not significantly different. . 
Stability - Test-retest correfations should be 
.80 or above over brief time intervals; i.e., 

one month or less. Reliability coefficients 
could be lower over longer intervals, particu- 


larly when instructional experiences have inter- 


vened, having a substantial effect on the level 


of functional literacy performance. However, - 


’ 


' in the case where no shift in level of performance 


has occurred, the reliability should remain 


' 


internal Consistency - High internal consistency 


is not a necessary criterion for the functional. 


pe literacy test,’ since a test which is highly ' 


or 


ong 


ee ey 


SA = anaes 2 Sa FB 
: Ete SeSda 


+ eee ee ee - 


homogeneous is not likely to represent the 


full diversity of tasks which should be sampled 
in a functional literacy test. In particylar, ‘ 


wd 

items involving reading should only be moder- 
ately related to computational items.. The 
correlation between reading and computational 


subtests, if present in the test, should cor- 


relate below. .70,.and preferably below .50. 


Where alternate forms are available, then 
evidence of internal consistency is highly 
desirable. 


‘ Standard Error Of Measurement - A statistic 
which allows an interpretation of the relia- 


‘bility of each score is desireable. If the. 


test discriminates at various age levels, the © 


standard error of measurement for each level 
would show how well this differentiation is 


accomplished. 


.3.’ Test - Item Structure - ; ‘ 


¢ 


Item Construction - Procedures used in item = 
eS eee : ae \ 


sampling should be clearly defined and repli- 


5 % 
cable. It is necessary that test information 


indicate the relevance and representativeness 


of the item pool in relation’ to the aspects . 
specified in the definition of functional - 


\~ 


wo ~ Bae we ewe 


wal 


literacy, whether material, behavioral, sym- 


bolic, or socioeconomic criteria are included. 
Procedures which are eoyerely, SAgersebesy would | 


be most advantageous but are not within the 


usual state of the art at present. Other pro-~- 


cedures are acceptable if the resulting items 
show close correspondence to the classification 
systens employed in defining functional literacy. 
b. Item Selection - Procedures used in selecting 
items from a pool for inclusion in the: final 
test should be based on observations of actual 
behavior, and yield evidence that the items 
load evenly on the various categories defining 
7 functional literacy. ; 
oe: c. Item Difficulty - Items should include a wide 
range of difficulties, including some items 
. relatively easy for 4th grade children, and a. 
va , 3 some items relatively difficult for 8th graders. .~ ad. 
. 7 Additionally, in, view of the diagnostic use : 


of the test, it should include a.sufficient num- it 


ber of "easy" items so as to yield a useful 
analysis of examinees’ strengths and weaknesses... 


2 


Appropriateness oy 
- The third set of criteria concern the appropriateness 


_ of the test in relation to characteristics of the intended 


“sample of examinees. The criteria focus on the three areas 


of instructions, items and format,and procedure. The present 


eliminated from the test. 


8 


‘criteria insure that ifrelevant. sources’ of difficulty are 


1. - Instructions 
a. Clarity - The instructions should be 
e appropriate in orientation and tone, 
inoffensive in content, and comprehensible, 
with vocabulary and syntax suitable’ for 
children in the 4 to 8 grade range. 

b. Purpose - The instructions should provide 
an honest explanation of its purpose and 
intended use. ; 

c. Comprehensiveness - The instructions should 

° as precisely and completely describe all. require- 
OS, ae ments of the tasks presented * the items | 

so that the examinee has all the Shen eRaNer 
needed to adopt an effective: pér formance 
strategy. Appropriate instructions should 

be ineluded. cn the selation between guessing - 


_ and test scores. 


a. Sample Items - The instructions should include 
sample items accurately illustrating task 
requirements and the level of difficulty 


of the tasks. 


e ° 


, 


realistic facsimilies of the actual materials, 


/ yeading and computational tasks should present 18. 
i 
¢ 


“+ @e - Mode - The instructions, should be presented ys 
_, dn an oral mode. A standardized script | Ty 

8 fe y : : 

' should be available which is suitable for 


fluid oral reading by non-expert examiners. 


i 
| 
2. Items | | . at . “a” 


oom memes ASA 


a. Motivation - The items should be relevant, 
a 


a 2 


up-to-date, and interesting for children in 


ra the 4 to 8 grade range, so as to arouse intrinsic, 


ar 


motivation in task performance without 
extensive exhortations eine required to 
induce cooperation and effort. 
b. propriety - The content of the items should 
not involve any invasion of privacy, or any 


sexist, racist, or otherwise offensive 


aspects of content. 


3. Format and Procedure 
PALA a oa 


a. Physical Quality - The paper should be of 
good quality, the print bold and readable, 
ry and the illustrations clear and up-to-date. | 


Reproduction of materials involved in common 


25 oe . a. 


preferably including’ full-colot reproductions. 
b. ee The test should be effectively : Att. 
arranged and cued to facilitate recognition 
. _ ’ of items as units, the perception of the 
relation of item scea to answers and ex- 
aminee, response, and the shegreanlon OF 4... ete 


ves rik : - % 


+ successive items and pages. ; 
c. Timing - The test should be time limited | | 
but permit most examinees to attempt most 

_ items within the time allowed. ‘Sectioning 
P of a fest, with timing instructions for 
each section may help to maintain app§c 
priate pacing in the brief time’ alloted °~ 


' for this test. Items at all difficulty 


levels should be represented in each section. ? 
a, Response Mode = The response should be marked » 

,in a fashion permitting machine scoring. rae 
ez Complexity - Each item should require one a 
\ simple and direct Boaponee with no multiple a as i 
sfelicae couple mom Steps or complications dther: than~-those~ ance oer p 
| intrinsic in the task represented by the 


: 4! 
‘Stem. Several items might be used based.on cane | 
t 


, ae 
iceeiasah ba aaa sonanenaan dot cen eee eee 
. 


the same stimulus materials, provided that 


oe the ewixtionshi¢ of each item to the stimulus 


, Normative Standards | 


As 


and for adults as well. 


is clear. ro Ree 3. Oe, t Mae 


» - n~ 
i oa ‘ tS + 


5 


‘Data Available. Although normative data is 


“not essential in view of the large sample to 


be tested in the Title I Evaluation, and the 


emphasis on program comparison in the evaluation, 


“it will still be helpful to have some prior % 


normative data available as a basis for.com- = ° 
parison. 
Normative Sample. It is desireable that normy 
ative data be available for the 4 to 8 grade range, 
Representative. It is desireable that the -: 
sample be representative of racial, eihnie; 


sex, geographic, and socioeconomic strata, . 


rather than the result of incidental. sampling. 


Reporting. It is desireable that normative 


.datarcan be Hen Len mEaeR ciacbaiaih sete its 
i 


combined over the r al, ethnic, geographical, 
and socioeconomic. strata represented in ‘the sample, : 


Item Statistics. It is useful if item statis- 


ties are reported both for thé whole sample and 


.. for the separate strata. ae difficulties are — 


« 


; gicnsinaien eT SY 
Ts Simi TL tage 


Spt vet 
‘ 


ene most. important statistic, but if selection 


'. OF classifica on uses are envisioned, then tten 


tee MeN) : a 
‘Administration 


se 


o 


3. 


4, 


. Scheduling. 


‘Aiserininat: n indices and intercorrelations are 


useful as Well. e \ . 


Personnet.: Won<axpert-sehool personnel * 
should be capable of administering the «. 
test with very little training. The ser- 
vices of a specialist or a testing Seat 
or extensive training should not be “required. 
The test should require no 
more than 30 minutes of testing time (prefer- 
APY 20, minutes) on one occasion ‘of testing. 
Tests €aking Tenser than 30 minutes BHeGEAs 
be easily modifiable for a shoster length, 
with no.more than normally expected loss 
of reliability. o 
Setting. 
_ administration in usual classroom settings, 
‘to aiouh sizes in the normal range for intact 
classroom groups, and without-the necessity 


of special equipment. a 


“Scoring Method. The test should be scored in an 


objective manner by ma¢hine. Machine scoring 


should be highly fail-safe and reliable, 


The test should be capable of “2 


. we teee eseemmeres = 


‘without complex error checking routines 


Fw fru 


_ to proof the results. 


5. Materials, - The test materials should bey Sat 
entizely of the paper-and-pencil test vari- 
4 ; _ ety, with no special manipulanda, slides, 
or other unusual components. 
ee Cost, - Costs should be in the normal range 
of paper-and-pencil tests having good quality _ 
sniee and printing, including color repro- 


‘ 


duction. ' 


“<Interpretation | 

a ae Manuals. A high quality test manual should. 
‘be svatlable, one which meats the appropriate 
APA standards 1(1974) for cant manvals. A sup- 7 
plemental brochure describing the.,test and 
pict dB 2 7 aid how to interpret. its scores should also be 
a : - '  avatlable’ for relatively possehlactoates 


consumers of the results. 


fee ‘2, Meaning, The test scores should be highly 
Ie a ee silat a meaningful and understandable in .terms of: 
4 specific performance by. a ponkechittent and 
- dence including the general public. It 

would be most meaningful if a hierarchy of * 
o a Oe hs Te Gee performance levels could be devised, in’ ; 


which a person placed at one level could be 


Pia ie ee SP? Ch ok: ee gat described as being capable of a doasteig list 
: ee | of tasks) and all tasks listed at lower 
levels. However, this may ba an unrealisy RAN 1 
tic goal. 
3. Scales, - The primary test scores should 
be directly understandable in absolute 
terms without the use of complex conversions 
or scaling. Forms of scaling or conversion 
to standardized scores may be used asa 
supplement to the primary scores or for 
use by audiences with a higher level of” 
technical background. ad . 
4. Implications. It is desireable that the 
implications of given test scores for : i oe 
« a edncntionel practice or. public: policy be 
oS clear and relativeiy direct. However, 


- what is actually required to meet this 


‘ . criterion is not entirely certain. 


acta 3 Boies Me ane es gee SERA AYER a 

eo * t * ais 2 Ne al vw wo De 

‘ iy id ‘ mes + a ee Ms Sie eee + > v8 
ee , 7 ‘“ ° 


- EST EVALUATION 


six instruments were selected for review according 


u 


to the test review criteria: The Adult Performance Level 


Test, ‘the Basic Reading skills Mastery ne the rundanental 


ori “Achievement Series, the ‘National Assessment of Educa-_ 
tional Progress, > the New York State Basic Competency © 
Test, and the Reading/Everyday Activities fn Life. i 
Three of these instruments were reviewed ‘pvaodonaly by 

ai a ee Northwest Regional Educational Laboratory under the direction 


a of Dean Nafziger (1975). However, thé focus and criteria 
of that review were determihed by a different set of purposes 
¢ and population characteristics than are operational inthe _ 


‘t 
. Title I Study of Sustaining Effects. These and ‘the veneine a 
ing three tests reviewed in this report were selected on the Ru  y 
8 
- 


basis of the reviewers’ judgenent, that they possessed some F 


Pea 
me . a + 


property or set of properties that. piaced them within the 


peas ; range of promising instruments for the Title Bg study’ 8. - a 
se . Purposes. It should be clear from the outset that the pos 8 ay ee 
: __ judgements made of these tests relate only to the potential ee 
‘ 7 “usability of t the instruments in the Title Dethay | of Sustaining 


ae ala ‘tinct ene Sree en rere Mawes vee 


_ Effects, and t the test ‘evaluations should not, in any way | be 


bk 


construed as either indictments or recommendations of. the | 


Sedaaainnliaidt Siihadiicaiaintiedhinchinaiatanenamianaan Sees Evie Dey okitietitentitaabh-afctemputh dete iriithehirrchin taps dnaosrantenisisiaieticet- 
ciabcinvaiael for adoption in other. contexts. a 4 Rg ss 
: & 4 
‘ : a sevemee 
E a . . r "He: * oP 
. ° we . ” a- * Bary % 
t Ye cn 
LY « | Sine ibs 4 : ' - * a ‘ i 1 
! a 6 7 e o sech = ®. 
(5, 8 a \ é : F ‘ ‘ : ve * 
Se me os Fataneeetamiat Ce a ee ee eo ot a recs .: Mee et OE Ras A ae n 5 . ob oe ‘ am ae 
4 sa : i ia 3 : fe : . ae . i ae | : 
1. a4 ae! = ; 
a eae f :, : Ma - 
‘ « : i - > , 
; F B : t ‘s oe * . * me 
« ; ¥ ee nS a: 
SE : ‘x ae o 
a ae L ; y ¢ toe OV 27 t s rae GS 
: ; ‘ e er . if t wd 
z , a 
‘ ‘ ees 


aera nee a 


Malt bsvboinine Level Test of Adult Functional Competency 


Adult Performance Level Project ; : 
. Dr. Norvell Northcutt, Project ‘Director 
- The University of Texas at Austin 
: Division of Extension 
~ -- ‘Austin, Texas 78712 \ (to ee eee 


_ Description 
| The Adult Per formance tAiei (APL) Project of the 
University of Texas Division of Extension developed the 
APL test as part of their mission to, ". . » specify the 
“Competencies which are functional to economic. and education-_ 
al success in today's«society and to develop devices for 
_ assessing those competencies of the adult population of 
| , the United States.". The instrument is currently in an . 
“experimental form nnd is not considered by its author ‘to be 
_ ready for ‘utilization. It is a short form test of. 42 
* questions: that are related toa variety of adult life 
‘experiences. For example, items include a 1040 Individual 
_ Income Tax Return, a bank deposit slip, itemizing of grocéry i 
. “pills and tax deductions. They. require performance in’ the 
_ areas of communication, POMPE a ELON problem-solving, ana 
"interpersonal relations. i a ‘ ose 


cinaneranemnesmmmmmes—te eerie 


The ‘APL field-test data were used to ‘define three | 


oa pa ‘functional categories: 1). adults who- function with difficulty; 


c 
a 
i 
oh 
ae 
1 

1 iparsice 
ve 
"t, 

+ 


Wiaam es | ‘functional adults; and 3) proficient Sik” Each ‘of the. yas 
le. - a5 three APL levels is based on three criteria: 1) predicted ) 


+ : 


income: 2) education; and 3). job status. The people in the - | 
| first: APL. p eategory are considered to ) be functionally, incom- , 


ca . ‘ , 
. heey Ge : 2 
28 ee 


. Pe dipaed : 5 , 
«" ee js 
, AP os : 
' f . ‘ er ; 4 
sone: ks 33... a 
a: “" 5 é . woe A : ; 
eg : se OTT, : r ot 4 i 
i , , , ; ‘ : 4 
4 : 
i 
’ % 


ae ‘ . af ea : 
. + abet 4, ~e! ee ne 
PPE HI AE eS FE Eek aay 


Bs 
ra 


Daw eeu Sie te Og. ge gp 


petent or to function with difficulty. Those in the second 


*wategory are, competent or functional on.a minimal -level, 


and those ‘in the third category are perintans in that they 
demonstrate competence or that it is associated with a 


\ SN og ; 
higher level of income and education. e = 


Test Background 
1. Purpose 
Bt - “the purpose of the APL test is to measure 
ae oe the competencies of adult Americans which are ‘ 
= . ae ae functional to economic and educatiqnal success 
| dn today's society. 
2. Clarity of Purpose to Examiners 
Re eae , The APL does not have an examiner's manual. 


= _ -in which purpose would be explained. 


“3. “Compatibility of Purpose and Test Construction < 


xe 


mae | : The APL theory of functional competency was 
7 # Gite - caeeeves at by Pocus. on the basic -requirements 
: ae | _- for adujt living. A review of the behavioral and 
social research: literature was made in an effort 
_ to find a way of categorizing. the néeds of the 
“ee i mn ~~ ~underedueated and underemployed-adul ts—-The-APL———————-f- 


ee aa ios project surveyed the State and Foaeral, Agencies in 


om ee oe the successful from fie unsuccessful adult. 


" Miditionally, conferences on. adult: needs were 


: ee 8. He conducted in different regions of ‘the country. | a 
7 es . “Through , this process, the APL project: developed a he go 
* muti-ticeted ‘model of compereney®:. ‘the elements 4 


# “+ on 

Ag 3 y a e + at * . 
+ 2 4 eo ee ee en ot een eS 
SS a ee ere ee ee ee Ek an ee | Sg 2 oe 4 


Lee 


Nineties ec ib Re a eB rte ck 


of this model included: 1) functional competency 

° specific societal context; 2) ca haa GOMpSLEREY. 
as a set of skills related to a set of jatiacn 
-kriowledge areas imposed by society; and 3) functional 


competency as a dynamic rather than a static process. 


z of functional competency formed the basis of a set 


_Of objectives which were then used as a. basis for 
to adult competency. The performance indicators, or’ 


"evolutionary process established a close-relationship » 


; Compatibility of Purpose. with een Sampling .. 


; actually encounter. There is no evidence that actual ‘3 


_ observations: of adult competency behavior were made . 


‘of the tasks shows them ‘to be reasonable experiences ui 
~~ for adults, but highly. irrelevant. to. aie alae” 
the ach to 8th a gees age i pia 


oe ER age Sey 


a6 a construct which is only meaningful in a 


The information used to develop the model 


the description of behaviors believed to be important: 
items, were written for each ‘competency. This 
between the purpose and construction of the APL test. 


-The information on item sampling indicates that 


efforts were made to select tasks: ‘which adults 


“to verify or generate: ‘these-tasks.—-An-examination—. 


~ 


> i. 


Fs Psychometric Quality . 


a a a 
Validity 
a. Content Validity - The APL was defined 


‘ this domain noes not correspond well with that 


"symbolic domain. was considerably too complex 


tye" 
eae 


ar rer ae 
3s Fw 


= notte 


Sore re. eee ease 
n ° f sy ree 3 | 


a oe eee, 
7 


~Ne- or 


specifically in terms of the ‘domain of non> 


= 
ry 


academic functional literacy in adults. However, 


of children in the age group to be sampled in 


the Title I study. The stimulus materials are 


representative of those commonly encountered by me 
adults in real life reading and computational ? 
tasks. There is no evidence that these were: 
sampled directly. from the actual gavenee of 
adult behaviors. ‘It was not within the APL 

functional competency study's purpose to even 


consider the behavior. domain of children. The 


for the Title I age range. be . ee 
The APL test was Bpacisicalty wéeianea for 7 

people who were socio-economically and éducation~ ee 

_..ally_poor, As desized,..the APL was not referenced._.j..J. 


-to a specific set of program objectives. | 
_skills and knowlédge areas of the APL: competency." i 
+ 


The dhs 
{, 
4 


a erenheee 


model overlap to an incomplete extent. with the a t z : 


i definitios of functional literacy adopted for OS Fan: f 


‘the Title I study of Sustaining Effects. | .- 


RP Te acne on SB Bae 


Nae i ‘ 


( 7 : 
“Empirical Validity - No evidence was presented 


in the APL materials reviewed which indicated 
‘that the APL had been related to measures taken 
concurrently or at a later point in time. ‘Neither © sai 
. » was there evidence to relate the APL to other ~ 
psychological, educational or socioeconomic 
independent variables. No information was avail- 
 able‘on the nature of relations. between the APL 


tae _.. .. . and other measures, or of the sensitivity of the 


3 tasks. : 
a ; . Ce Construct Validity - No information was provided. . | 
ea 
a Reliability . 


- 


When - administration procedur are held 
o3 - | relatively constant, the relationship of APL ‘perform- 
; : 7 ancé in two independent samples was highly reliable 
across. various ‘subject dhavinegristicg such as ‘income, | 


education, occupational status, - urbanicity, ethnicity, 


7s 


ae, 


: sex, age, and other demographic yoriables. other 


\ indices of veliabsiity were. bisiah reported. 


caamamanasmeandadeiatin Sees ain alata aan ose oyetntti rte rhe 


Ry 3. Test - ttén Structure 


ere 


The item construction of this test,. Salta: not. 


"traditional, was identified with what. coulda: ‘be a 
sidered typical adult life ekperiences ina distinct 


4 
; 
3 
{ 
i 
; 
bi 
ak 5 * ¢ 4° ey 
ee es Poe SS PEF 22 ‘ee Nett 39 Seti 
a ae Pe * “ 
Fs s 2 ae at 
bias ak ‘ 
4 . aod an 


TAF Se 


te a 


tasks were assigned to each: experience. Four. primary . 


skills were considered indicative of. areas aa 


$ 
i 
’ 
ree eee eer: 
ps ne 


” 


32 


- 
+ 
a £ 
F 
s 
_ 
: 
nanan, se oe 
aes * T2 
aE tp ee 


* placed on aaultes 1) reading, writings a} computation; 


3) problem-solving skills; and 4) interpersonal 
relation skills. The first two correspond to portions 
of the definition of functional literacy adopted for’ * ra 
the Title I study. The knowledge areas of the APL a 
resemble some of the areas of life activity and the 
BS © socioeconomic functions included ‘dn the operational 
Y . definition of functional literacy in the Title I. 
study. Item selection was based on expert judgement a 
and revised successively, on the basis of field 
“testing. Data on item difficulties were not available. 
In the reviewers judgement, most APL tasks are much 


- t00 difficult for children in the Title I age range. 


Appropriateness : 

| l.. Instructions 
-In the reviewaxt gs Judgement, na vocabulary was | , 
somewhat over-sophisticated fr the sample of adults ra 
| _ taking the test, but consideration was given tessakiog’ i 
* the test less difficult so that every respondent —_ | 
- “could” attempt. every -ecses—rascarer-in gitar ert —-} as 
‘would probably — find that the vocabulary and syntax of 
aaa ‘instructions donot correspond with their common ~~ 4 
ar life experiences. The, task instructions were sufficient: 


1 » 


pee ‘for understanding. No sample items \ were provided. 


Ry a ee Information on test administration procedures » was 


unavailable. The instructions are amenable to 


: ME gc oral presentation. . Fe. gt B 

=? ¥ ! 
= _. » 2. Items . x, wi > |. 
oo a ; : \ oy Se ee 
Most items do not appear sufficiently relevant . -' i, .; 


. - oy interesting to be motivating to 4th to 8th grade 
| children. No offensive content iia identified in 
i . the APL tasks. | | 
3. Format and Procedures 
The physical quality and layout of APL tasks bro 
of superior: quality. Many of the stimulus materials ‘ 
were good facsimilies of real-life forms arid liter- ' i 
ea Z . acy prototypes. . Details on timing were unavailable 
| i. Multiple choice and brief ‘éxaminee-supplied answers ag 
Bee = _ > were the response modes used. Although single direct | . 
a Pees _: © Yesponses were usually | required in the tasks; some : 


tasks RETILE SS several exaniinee-supplied answers. 


Normative Standards gu . * 


: Son oe ‘. . Data on the APL were obtained ion "representative" 

a ae voninn SER 108 0: of American adults. - Prior to this, the APL APL was. i ; bs 

es field. tested with 3,500 undereducates and “underemployed adults. : 
ae in 30 ) states. The test has been administered to five inde- 

ay “pendent. samples of 1,500 or more persons, for « total of i 500 - 

Pg adults. Detailed test results: were not ' included An the report 


made available fo the reviewers. The results were: 


Panta hee al te * ¢ 

t 4a 3 Z . 
wn a ” % a oe ; . 
‘: « a. 

bi ? "934 a oe 

. = 
ae ‘A =) , 3 
* + . ia aol ; 
* ae ’ 


Ug 
‘ ee 
* 

4 
2 
camels 


"which gould be classified aSeereane to the three ee levels 
‘of competency for each. er of knowledge and in each skill 
domain. Children. were not tested, and no statement was 

made to, indicate that the sample was ‘systematically stratified 4 


oir aaa i variables. 


_AMministration . 

i Poe? The APL tasks are amenable $0 group adninistration in 

=a | a classroom setting by nonexpert personnel. A set of these 
pe Pe scala eoutd be selected ina manner which would produce a 

x : 30 minute ‘testing period. No informaiton on the scoring : 

: process was available. In the reviewer' 5 judgement, a: 

aa is separate machine scoreable response sheet would be required. 

a The APL materials are entirely of the paper-and-pencil variety. 
ae | Informftion on cost was not provided. The test is still in 

x 


re *. experinental form. 


a ee Interpretation | . / + s 
ia . 
No test manual was available to the reviewers. It ra 


So nes presuned, though ‘not explicitly: described in the material 


i a _.available, that task scores can be linked to the three 


ic teaaianteatens mepreennniomnesratommsenanvanrnen otis : 


‘* levels of APL competency. The principal implication of APL °° ‘oe 


rok: 
fine 


U3 basins performance pertains to the respondent's ability to. ; 
owe +, plik acaiiaia ane ar ie wea ss 
: "perform specific. tasks which: are generally ‘agreed to halve | det a 


7 
* a rig 
ae 


Dee ee _ eosioecononic benefits to American adults. ‘or ar ee 


~ Evaluation : * oat 


If this process were attempted anyway, a clinical. PESESer 


-> Although the APL meastres Heb reading and computational 
skills, it is intended for use with an adult population. 


In the’ ‘reviewer' s judgment, most APL. tasks ate much too t mE ae 


‘difficult for children in\pe Title I age range. 


From a banksdonptroction point of view, although efforts 


were made to select tasks actually encountered by adults, 


there was no evidence that sisetvations of’ adult behavior 


were made so as to verify these tasks. The absence of validity 


. 


data. does not allow:a judgment on the relationship between 
“the APL and SEner tests and variables impostant to the con- 


Ee. cept of functional literacy. , 


Modification of this test would depend on the ayaa 


of easier items. Since the APL was constructed for use with 


adults, it is unlikely that enough gany items can be obtained 
from the item pool, so as to construct an instrument appro- 
as 


priate to the population envisioned in the Title I study. 


of the new instruments would be necessary before any judgment 


could be made regarding the testa’ 8 appropriatness. 


36 , ; we 


, 
LM alk OB D8 


ign Basic exits Reading iiastezy Test . aa ; 
eS ‘Maryland State Department of saucation 
Division of Instruction. SP as o 
-Baltimore-Washington Intefnational Airport 3 | 
. +. PO. Box 8717 . . ‘ ¢ 
ats iar co Maryland 21240 . iio : ra 4 Neo 


k 

aaa Description * rn a ae ae 

ie Pe ; Y : aX 

e igen ‘The Basic Skills Reading Mastery Test ¢ consists of three 
| separate. forms which purportedly test two of the State's five - 
reading. gdals with children in, three age groups (J2 years to 


adult). Four scored subscales were developed on the basis 


. OFSeE OFRIS °A°N 


of behavioral objectives flowing from these goals: 
i? = 2 1) following directions; 2) locating references; 3) gaining | 
.- dnformation; and 4) understanding forms. The test also 


assesses how students feel about reading. 


Test Background 
on 1. Purpose 
The purpose of the BSRM test is to asses’ two 
- of five reading sats asoneee by ‘the Maryland 
State Board of gtieation, Specifically, these goals 
were, ". . . to meet the reading demands for function- 
ing in society," "and ". . - to select reading as a 
pacennad activity." | om 
"Functioning in society" was listed as having’ 
five basic goals: ©. 1) following directions; 2) locat-" ° 
ing references; 3) attaining personal development; : 
Pie a) gaining information; and 5) tinderstanding forms. 
in specific age levels designated. as the 


a0 emesis ae 
epee’ ne iain 


a 


Cae target population a 12 year-olds, 15 year-olds, 
and 18 year-olds. 


s 


Although the purpose of en test is compatible 


with ‘che instrument envisioned for the. ie Oe study, 
the ages of the target population are only partially 
overlapping with those to be included in the Title I 


study. 


. Clarity of Purpose to Examiners eee 


Clarity was good. The manual makes the 
purpose explicit. ' 
Compatibility of Purpose and Test Construction 

The five basic goals listed shove Areke braves: 


lated into specific behavioral objectives by a group 


waeQE~reading and test development specialists. Using 


these behavioral objectives as a guide, Maryland 
State department personnel and test developers | 
solicdeea published materials and printed forms 

from tax offices, welfare agencies, Chamber of ‘ 
Commerce and other Federal, state and local agencies. 
These materials were used to generate a bank of | 


"over 500 tes items to correspond to the approved 


series of behavioral objectives. ° ” Each of the. Ns 


dtens was then reviewed ‘by = panel of reading’ ‘and 


test-review experts. Student: evaluators were also 


used tod assess the items for "clarity, logic, 


difficulty and readability." 


38 | eee a, es 
43 : . s ane 


fi 
To eee mene 


- 
. 
eth, Sas! 


° 


De. gs : of 
ras 7 RR ' ae : ‘ et 


‘The basic goal of Needs personal aiveteps 
"ment" was,"-"... designed to assess the attitudes . 
of ‘students and how they fee el about reading." A- 


“separate development procedure wae employed for ar 


. this category, with items generated based on the © 
behavioral criteria and then reviewed, by teachers 
and reading Specialists: x small-scale field test 
was performed to refine test length, format, to 
Clarify directions, and to select the best items 
Finally, a atateutde sample of seg stangsté ie 
. selected for the field-test, representing geographical 
rectors and minority groups. | 
At first, the final test contained basic or 

easy items, and advanced or difficult items. Later 
(after the test had been-used with 47,000 students) , 
two major changes were incorporated: 1) the distinction 


between basid and advanced items was dropped in favor 


of a distinction between items measuring "survival" | _ 


skills vs. those not so viewed; 2) the test was. 


lengthened to include enough items such that diagnos- 


4 fe?  gat 
i #1 a 
i 
* ' 
¥ 


es 


* 
: 
ee eit « > 
£ <a 7 -7 § 
ree ae EET cabana se <astin wine << Sth Oi 


tic. information could be obtained for each basic 
goal. : : 3 . 

© The present reviewers view the above-mentioned 
7 changes as highly compatible with qualities of the 

vdastrinant: agin However, some problems still 


Toe, i - * : " : A 
é ’ i : Ag 
: - 3% , iy 


* 


od we 


r 
a 
rae 


ae tga 


ce eR pr 


extat as evident from_test construction procedures.- 


| Sompatibitity of Purpose with Item senpisng 


ae The manner-in which the gtinuioe materials were 


obtained for item construction @ia not . incluae ee. 


actual observation of the ‘children and young adults 


‘for whom the test was intended. Thus, it appears 


_ that whereas some items certainly test. the survival 


skills of an adult (i.e., voting directions, appli- 


. cations for driver's license, working permits, W-2 


forms, welfare forms, etc.), they do not seem | 


" applicable to all ages. In the 15 and 18 year-old 


forms, some Jews’ ‘were present (directions, for 


’ sewing, an ‘application for U. Ss. savings bonds; a 


chart rating household thermometers, etc.) for 
which there- is no evidence that members of the 
target population have actuary been ~~ | 
nea them. 


There is no evidence.that test items depict 


actual eee, 2 ‘life AREA EES OG of minority 


groups. 


erties the ere stouption of itens 


“may be satisfactory ‘(ie e., items actually discrim- 


inate among age groups), it is not clear that 
diagnostic assessments of examinees produced by | 


this test measures survival skills from. the frame. 


“of reference of the examinee population. It appears 


<= 


| 


wif Me re ow 
1 ¢ 


é 
fe Rpm ETT ee OR) PRP Phas 


' 


eee: 


“Psychometric Quality 


"more | Likely that any ‘such Aimenneees information| 
Produced measures, survival skills from the frame 


N. 


° reference of the examiners. 


The: items measuring. “attaining personal te nk 


dévelopment" (e.g. "How would you rate yourself 
as a reader?" “How do you feel about reading as. 
a spare-time activity?") may be highly sensitive 
to social desirability. No information in this 


regard was provided by the manual. 


1. Validity ae 
ae Content validity - The test was referenced 
specifically to the domain of functional 
literacy. However, the population of possible 
stimulus materials was not defined, and thus it ” 
is unclear to what degree the five goals for 
_which behavioral objectives were determined 
fit a theoretical material domain. The test 
does not include computational tasks. It is 
not known to what degree the test samples from. 
the behavior, symbolic ,and socioeconomic domains. 
b.° Empirical Validity - No data was given in the | 


test manual. 


ce. Construct Validity -'No information was given 


4 


— manual. ° 


seme ee rete. idee + 


1s, 2 


£ 
Bt a ee cane) Fe 
Cpe pth Ramee ge 2 


-* 


ATMO RAM 28 wo Bee 
+ ‘ 


-_ eke 


oe 


1. 


em 


. Reliability = J fe. 


Only internal consistency ‘data was available { 


" (Kuder-Richardson 20). All coefficients reported 


are satisfactory. No stability or Comparability "eye es 


. coefficient& were available.’ No standard error of 


‘measurement: was available. 


Test-item Structure 

The procedures used in item construction were 
clearly defined. The ‘item pool was reterenned to 
predetermined behavioral criteria, rather than to 


various aspects specified ina definition of the 


_.possible material domain from which items could be 


drawn. Item selection was not based on observations 


of actual behavior. No computational items were 


-"ineluded. Item difficulty appears appropriate if 


refesbhced to the veychoueteic properties of the 


items. But, as mentioned above, it is unclear 


whether the items sample actual behaviors. 


Appropriateness 


Instructions 


They are clear, but do not explicitly state 


. 


the purpose and. intended use of the test. Although ~ 


they are comprehensive, there are no item examples es ey 


included. Although some instructions are presented . foe 


instructions. The reading ability ae may be 


more advanced than necessary. 


orally, the student is required to read the item mt I 
42. i 
4 : | 


. 


: of scores are provided. 


computational skills are included: the test was Ce kiatea. a 


2. | Items 
| To the extent that they sents gotual behaviors, 
they. appear relevant and motivating. The propriety ; 
of items is maintained throughout’ 
_ 3 Format and Procedure 


The test failed to represent facsimilies of 


actual ‘materials, thus réndering the physical quality |: 
eae 


of the test rather poor. Layout ,’ timing, perpenes 


mode and sonplexity are accept@le. 


he 


Normative Standards 


_No normative data is available. < a 
a ne ee 
Administration 


: The criteria for this area are deemed to be met by 


this test. Personnel,required, scheduling, conditions of 


testing, scoring, and test components are appropriate. Cost 


OE eat is not available in the manual. 


Interpretation 


, No separate manual is provided. Beyond referencing 
"passing" score (80% of items correct), no mention is 
made of the meaning of scores or. how to aotareret them 


either for: diagnostic or achievement purposes. ‘No implications © 


Evaluation 4 ae: 
ote Two. equally large problems. exist. with this test from 


the standpoint ‘of use for the Title T-study.. Piret, No © 


- 


a; 5 ‘ a . ane ee ~ % . 
; 43 . 7 : , e 
: : + + i 
. ‘~ ’ Ro i ” 4 


i Pn a e © j ree « 
; ae i ss 
2 a Be ag 2 

7 s + , - a % 4! al as ts . 


: d Ne eo 


a ee . 
ee nd 


om. 3, 


, to be used with a population aged 12 to 18 years old. ° 
‘Thus, it only partially overlaps with the examinees en- 
visioned in the Title 2 Study. | 


From a sondeutuny point of view Sabie aie also Brg” 


“blens, the most important of which is that the manner i 
, which stimulus materials were obtained for item construc- 
: i . . » eis . 
tion did not include actual observation of the people 


for whom it is intended. Consequently, some of the items 


(i. e., questions about an application for a U.S. savings 


_ bond) are very unlikely to be appropriate to the aaa 


tion to be tested in the Title I study. 


Modification of this test would hinge ‘on the avail-— 


’ ability of an item pool from which easier items could be 
» 
obtained; i.e., items which are: more appropriate to_young- 


er children. In BAGLELONs a computational subtest would 


have to be either constructed or adapted =o some other 


. 
a 


source. 


» Needless to say, a clinical pretest on the newly- 


' ereated subtests would be a requirement before any judgment 


can be made of the test's appropriatness to the Title I 


Study. 


44 


i 


sb aiauai at 
Seed = 


333 


_ OFBUE 02828 °2°N 


“ 


ae +e we wee 
fed 
ra 


acs we 


_‘ Fundamental Achievement Series ae s 


. 


‘The Psychological Corporation 
757 Third Avenue 
New York, New York 10017 \ Y 


. ~Bescription F 
| The fundamental Achievement Series (FAS) was donigned 
as a "culture-relevant" test for the disadvantaged. Tt. 
consists of a Verbal test and a Numerical test, each ee 


requiring a 30 minute testing period. The test is intended 


for use as an employment, placement, or diagnostic test 


for adolescents and adults who have had less than the 
usual’.exposure to formal education. It yields three scores: 


Verbal, Numerical, and verbal + Numerical. 


‘Test Background 
1. Purpose. The FAS is oriented toward the measure- 


¥ ment of verbal and numerical skills, and'is "... 
intended for use in the employment of adults and 
adolescents who may not have had the usual exposure 
to formal education." The test is viewed as a.. 


placement and/or diagnostic device. The test — 


cover a range of ability that extends to "... some-_ 


what above the Eighth-grade level." 


{ i ae “Clarity of Purpose to Examiners. The manual makes 


the purpose explicit. 


3. ~ Compatibility ‘of’ Purpose and Test eecnet No 


data on test construction is given in the manual, 


therefore it is impossible to rate this test on the 
45 


Jo 3 we, ‘ ; ee 
re ; ry) 


o ‘ _ * 


‘Td 
Be v3 
tg 
¢. 
£8 
fe 


= 22 ene 
+ 


. =- 
: ‘ 


a a 7 ' \ ‘ 
relevant criteria. A brief description was given 
of types’ of items included, but no mention-was 


“. made of how these items were obtained. The test 


“has two alternate forms. 


. ys, eS aes ; : ; 
.. ..* Psychometric Quality . Ae 
poh) A Valdadty. i . 

ie a. No information is given in the manual \on content it: 
$ . . ‘ . 9. . 
P "validity. ie 
Ses cpa oat | bs Empirical validity. The test was administered { 4 

: he 
The 


wie x to Black employees in a southern Hospital.. Con- 


currently, these employees were rated by their : 


= fs ’ ‘supervisors on four performance factors. "similar, iS 7 
aa studies were conducted for employees of an Eastern i? E 
& , 7 ~Bank who were in a private-sector anti-poverty . rT. 
oo sh training program, rn this case, success. criteria ‘48 
7. were teachers' ratings in various subject matter. | 2 


' . \ 
Six other similar studies were reported in the 
Manual. Correlation coefficients ranged from 


\ 


coefficients were encouraging in that they \ 
\ 


{ 
. a high of .62 to a low of -.01. Some of the high be 
mo 
actually related to "real-life" criteria. abou > ' 


\ _ half the coefficients presented would be con- be, 7 
sidered "useful" in an industrial selection ] 


J 

f 
situation. Although the purpose of the present j f 
: ! 
review is to find a diagnostic rather than a oe Fe 
¥ 


‘ plassificatory tool, it is to the present mea- f 


-sure's advantage to note that whatever the 


46 a 8 = 


oils. are that it measures, they’ are related: 
to some real-world job behaviors. The present 


“measure has been correlated with thee tests 


such as the Differential Aptitu¥le Tests, Cald=- 


" fornia Test of Mental Maturity and The Wontex- 


46 Personnel Test. Correlations with these 


instruments were not particularly high (in the 


~58-.65 range) indicating not too great an over- 


lap with the skill levels -required by the stan- 
Gardized instruments. This fact is advantageous 


under the criteria of the present review. — 


Cc. ~Construct Nenraeey No information is given in 


the manual. 


‘Reliability. 


a. Comparability. The equivalence of the two 
_forms of the test was examined with three 
separate samples, and reported separately 
for each component of the test. Although 
the manual reports that the two forms are 
similar in content and comparable in dif- 
ficulty, there is no way of determining 
- whether the number and kind of items are 
equivalent. Only one form was included in 
> the package. Standard deviations and means 
ate reported statistically equal for the two 
forms in all three samples, with one excep- 
“tion, This aicdotion dlenppssred whan the 


three samples were combined. Thus, for 
a, Se * 


NEES BL Fy SEG Fe 


“ff 


3 


_ OFseg S3I8 °2°N 


Cc, 


» peactioa purposes,” ‘the ewo. forms may: be 
.considered equivalent. No correlations, 
_ however, were reported in this section. 


” Stability. Test-retest POgesicients: with 


. Using Form B. .only, stability was a bit low- wk 4 
‘ er (high aint: but ‘still quite good. 


cient open to possible fluctuation from 


| ~ *y 
form "“A® revealed coefficients above. .90, 


with a retest time span of two to three months. _ 


Using Form oan first, then "B", reliability 
still remained at .86 for the combined ver- 
bal plus arithmetic score. The sample. size, 


however was only 39, rendering this coeffi- 


sample to sample, Overall, stability is 
good, and quite appropriate under the eon 
dards of the present review. . . 
Internal consistency. Internal consistency 
was measured with a sample from a Southern 
city school system, and it was’ reported 

by Race ("White vs. Negro"), and by grade 
(Grades 6, 8, 10 and 12 reported separately). 
Kuder-Richardson Formula 20 yielded coeffi- 
cients above .84 for all grades for both 
White and Black samples. These coefficients _ 
were higher for Form B (above .95) although 


-n® breakdown by class and Race were given. 


For purposes of the present review, these. 


i, re LR piped cern Tecan d one 
pe - ae = ee were considered appropriate. ae ; 
aye fe "i 4,” Standard. Error of Measurement... ; For Form ot 
et > ene. it is reported by grade and: by race va : 
ee - a , : (Black vs. White), for grages, 6%, 10; 12. , ie i? 
Se. a8 | . ‘The standard errors of measurement, indicate et he ! 
ries 2 eee tau that the test can differentiate groups by - as + 


grade; i.e., there is almost no’ overlap. : 


of: scores between grades. ‘Additionally, 
ar ae pant ait also indicates it can separate the White . 
se Ps i oe “and Black examinees, the latter having lower is 
Be aps" ‘means in all grade samples. y* 
3. “Test-Item Structure. 7 Ee 7 
: . Only a aR Arehy gtatement about item diffi- vs > Bi 
eee | - ulty is made in the manual. It states us it 
ag” ; that enough “easy” items were included’ 80 ) i. 
as to permit most examinees to answer a By 
as, considerable number of items correctly. | Be 
Re, - This area of test review is fairly critical’ me ’ 
*, for the present purposes, since information ee 
“ on item construction and item selection allows “Pyek fed 
oh Ge ae a TUMaMENE of the representativeness. of : 2 ’ a 
a —— % the entive item pool an reference . to, ‘actual . a 
Ws ‘ ge examinee behavior. -Lack of information in. a ae ‘ 
Bo oe os this area is considered a serious drawback... ae oat 
ere 2 . is “8 . +4 
‘ Appropriatness ' ee , Pg ee 
oe Ae Instructions. Examinee instructions are record- 4 _ | 
ed on tape. ‘The manual only has instructions «= § + "| 


BP xi fe oe : iStvaebout tape loading, unloading, materials, 
? ee ee etc. .Thus, a reviewer of this test. can not, 

-_s from the manual, tell whether instructions 
z * > e meet the criteria of the present review. Y | ee 
2. ~Itens, Although it is not clear from item ~ : 


=. ‘construction procedures if actual behavior was 
“\ | sampled, items do.appear relevant and motiva- ae. 

GE . ting. the propriety of the items is maintained. acs 

3. { pormat.and procedure: “the quality of the . 


We 


paper and layout is good, although facsimilies 


Po ee a 


a 
Se 
t a3 


‘ of actual materials were not always presented. 


=2 4 
P z 
Bie apes, Utama pase 2oe et 


emt nes 
pe 


Layout, timing and complexity ‘ are acceptable ial Bey 

a : ; ae SER 
Po ee a aA under present standards. REePONAES are PRGOL AEM : + 
. 


on the test booklet. Scoring is done by hand. ngs 


This form of scoring is not considered adequate.. a ee 


a “Normative Standards 


Percentile norms are ‘oeesentod for Verbal, Numerical, 


. and Verbal + Waseca scores, for both forms. The narra- ‘f — 


= 
. » 


tive Gata was obtained for both School groups and a Rae 
trial groups. For School Grades 6, §, 10 and 12, norsk , Pai 
are available with Form A. Forms A and B-were normed on 


\ ‘the various Industrial Groups, presumably adults. Theré 


__seens to be some geographic representativeness’ in the Gy 
ere Industrial sample, although no data, was given on the ethnic, a. Eee 
¢ ae. sex, and socioeconomic sampling. “The School Groups are 7 t ‘ 

¥ .brokened down by race ("White vs. Negro"), and only a Nor- 
# ae 
— ay 55 - es 


- . 
mi a . aes e x. 
thern and a Southern School was sampled.. No item 
ae: ae: statistics are reported. The norms presented in f£eference : 


to Form "A" are considered usable for purposes of the ie 
“envisioned in the present review. | The major Seawnack : ane a 
is that these norms are anchored solely on the normative 
‘sample, and thus are not interpretable with reference to. 
behaviors subsumed within the concept of functional literacy, . 

i - i ae 

* Administration 

‘ e ? The personnel paquired for testing: and. -schedulihg 

7 _ time are acceptable, ‘However, the conditions, of testing 
require spécial equipment (tape recorders), and the manual : 
scoring system is not amenable. to reliable machine scor- 
‘ing. A package of 100 tests of either form costs ea 00. : 
Scoring keys and manual are $1.70 for a set of two. In- 
struction cassettes are $8.50 exon: two must be purchased. 


-"t& Both numerical and verbal tests are to be administered. 


» 


V4 


‘Interpretation ; 
Beyond the norms mentioned above, no other aidin 


— 


. anEeSPE Sean scores is given. Spores are .not refer enced t, 


"te particular behaviors or tasks which examinees | ML 
ty _ capable of wer Foeniing at difficult score levels. Thus, 


the ‘scores are anchored solely on the normative samples. 


austen -/ to. the. extent that the normative sample. is-at-variance~ ceeinonnonne 


_ with the population for which the test is intended, the 


above norms lose their utility. - i = wre 
~ ; - 


° ~ 


A  epdzeaa/barpeaw' _ OFSeE 8989S °X°N 


Star te we see" 


a 
| 
oe 


_ Evaluation 7 


- 


This test, as is, cannot be tated appropriate for the 


“ Title I Study, primarily because it was intended for, 


tested, and normed with grades 6, 8, -10, 12 and ro 
Industrial groups. Additionally, there is no information © 


on item construction, and thus it is impossible to deter- | 


mine “whether observations of reading behaviors were used 


> 


for item generation. 


On the positive side, this test includes. both com- 


. “putational and reading skills, - and. it has been widely 


normed. - 


Modification of this test to suit the purposes of 
Title x Study could probably be accomplished, provided 
that the publisher has item statistics on the remainder . 
of the item pool. [tne new tést, of. course, could ‘HOE. 
be. faded by resent norms, and thus a clinical pretest 


would become the absplute minimal requirement for observ- 


ing yay new test would behave with a sample appropriate 


to the Title I Study. 


\ 


- 


ss 2@ 


“Ae oF 


. Ofseg 03898 °x°Nn | 


* 
O tere see 


-Wational Assessment of Educational Progress in Rending 
Released Exercises 


° 
+ setae cwone mk ie me pnemene e so ai Me ’ : 


“Education : poautucice oe: the States 
300 Lincoln Tower - et * x 
Denver, Colorado 80203. a  ® 


| General pescription : 
he National Assessment of Educational Progress in 

Reading (NAEP) sities an assortment of reading skills at 
‘four’ age levels: 9, 13, 17, and 26 to 35 years. The study 
was concen with the ability of Americans to read printed 

- Materials, and more spec¥fically, | ce with those reading 
skills usually Tangne in” schools and.with the percentages of 
Americans who have attained those - skills." The total sample 
included 98,016 people ranging in age 3 years to young . 

salite. The NAEP exercises were developed around: 8 themes: 
1) understanding words and word relationships; 2) graphic 


i materials; 3) written directions; 4) reference materials; 


5) gleaning significant facts from passages; 6) main ideas 
eo “and organization; 7) drawing inferences; 8) critical reading. 
: _ Of the original pool of items, nearly 200 neve. been released v ir . 

_ for public use. The results obtained can be examined | | 
aécording 46 several group characteristics: 1) sex; 2) Black 
és (Or White race; 3) parental education; 4) geographic region; 
° 5) size and type of compunity; and 6) ade 


a . 


58 


im 

‘Test Background . eh. : eae 
poke 1. Purpose ed ee , : ¢ 
The purpose . f the NAEP busied gu is to assess 


ed 


the. percentages of Americans who have attained those 


reading skills: usually taut in schools. These’ 


skills were categorized according to the eight themes. 


The ‘themes are based upon six reading objectives: 

1) comprehending what is read; 2) analyzing what, is 
read; 3) using what is read; 4) reasoning logically 
from what is read; 5) making judgements concerning... 


what is read; and 6) having attitudes about, and an 


interest in reading. _ | if 


Clarity of Purpose to Examiners 


In the NAEP study, the it ere administered 
by professionals who had been ined for the task. 
It is reasonable to assume that the purpose of the 


assessment was made clear to them. Parsee a test 


package is not available, the clarity of instructions 


cannot be determined. 


Compatibility of Pu ose and Test Construction 


The test was conetrueted on the basis of five 


vesting objectives formulated by a committee of lay 


and professional advisors. Each exercise was developed 
“within the framework of these objectives. The eight 


themes were then used to classify the exercises. Thus, 


os et 


SFSU ©3828 “X°K. 


se = soe 


ase, 


mos 


re : 


Kepkzeaa/Suypee" 


. . Sy ’ 
oe ‘ r * - © ner 
are : : F 
¥ % 4% i . ‘ 


the purpose and method of test construction are highly 


% 3 > compatible. ” 


4. Compatibility of —— with Item Sampling 
The items were not sampled from actual extra- 


a 2 


curricular literacy experiences in‘the lives of the- 


‘ 2 population of persons who would take the exercises. 


4 


Psychometric Quality 
ol. Validity me 
a. Content Validity 
The exercises were not designed as a test 
of functional literacy in the sense adopted for 
the Title I Study of Sustaining Effects. Instead, 
the purpose of the NAEP exercises is explicitly | . 
linked to the academic setting, having essentially 7 
the same purpose as an achievement test; i.e., to 
; : measure reading skills usually taught by schools. 


The stimulus materials do not represent those 


commonly encountered in real-life reading and 
computational tasks by children in grades 4 to 8. 
Computational tasks are not included. Most of the 
stimulus materials would be more common for weet 
| ee | children and adults. “Moreover, the relevance of — * 
the materials to economically or educationally. ae a oe 


—, Sgadvantaged children is unknown. 


‘ 
ee > 


55° . | ; . : 4 ‘ 


The ‘real-life reading and computational 

' behavior domains of children in grades 4 to 8 

were not directly addressed by the NAEP exercises. 

he exercises were developed hpon a structbre of ~~ 
reading objectives. Although these reading Shjeex 


‘tives probably overlap to some degree with the 


real-life behavior domains of the sample of children 


to be tested in- ‘the Title I Study of Sustaining 
Effects, they also include aspects of reading which 
are probably not critical to a functional level of - 
literacy. In addition, these objectives were 
formulated by logical means by a committee instead 
of being drawn directly from the actual reading 
domain of children. The individual exercises 
were developed by a test construction contractor 
~and reviewed for acceptance by NAEP consultants. 
Both tancuase and graphic representations 
appear to be uncommon, not entirely relevant, and. 
too difficult for children in grades 4 to 8. This 
-would be particularly true for a student sample 
containing a substantial number of economically 
and educationally disadvantaged children. 
Many of the NAEP exercises would’ reasonably 
be judged to possess properties representing various 
socioeconomic functions or benefits. Others do not 
possess these properties. Socioeconomic importance 
was used as one criterion for acceptance during 


the NAEP review of potential items. 
56 


61 


- ‘ P as ZN eae aes Fe a Be ae 


a. ee : A ' ara 
é 7 eg ~ . ‘ ‘ 7 
av, : : aes s 
. * % : ‘ h 
H 


As desired, the materials are not referenced 


to specific program objectives. In terms of 
ctiterial objectives, the definition of reading 


skills used by NAEP does: not coincide with ‘the a ee 


definition of functional literacy adopted for : 4 


reer 


' the Title I Study of Sustaining Effects. Thus, 


lace 


: a4 no clear link between NAEP exercises and the 
operative definition of functional Literacy is. 
possible. a 

be ‘Empirical Validity 


Results of NAEP exercises have not been re- 


-* 


_ OFSeE 0289S. “A°K 


wee 


lated to other measures taken at the same timc. 
Neither have they been related to other measures 
taken at later times. No studies were reported 


in which performance on the NAEP exercises was 


“Kepkxeag/Buypecd | 


- . related to other psychological, educational, or 
socioeconomic independent variables. 


c. Construct Validity 
The NAEP exercises were not founded upon 


‘ 


25 a psychological, educational, linguistic or education- 
| | al theory, or related to educational practices. 


2. Reliability - we | ey % 
"al Comparability ~ ; eta See Te 
Although a series of separate reading exer- 


cise packages were constructed, these were not. 


~ 


- 


Seegarded as alternate forms of the instrument, 
and statistics of comparability were- not\ developed. 


4 a Ny = \ 


be. Stability . 
No SRESERAS=ON on Pestoxerest correlations eis 
were available. 
_ - es e: Internal Consistency . 
| No indication of internal consistency was 
provided. 
ae 
d. Standard Error of Measurement 
; Standard errors of percentage of each response 
for each item is shown for the total national 


sample as well as for each demographic grouping. 


\3. Test - Item Structure 
a. Item Construction 
The definition of functional literacy employed 
- ‘ - in the Title I Study of Sustaining Effects does 
not correspond with the reading behaviors addressed 
by the NAEP. Therefore? the NAEP item pool cannot. 
be said to be relevant and representative of the | 
operative definition of functional literacy. The 
NAEP items were constructed by a test development 
contractor and item sampling performed via the 
expert judgement of a review panel. 
b.. Item Selection | 


1 Selection was based on the judgement of a 


committee rather than on observation of actual 


behavior. 


f 
By 
“| 


q 


- @.. Item Difficulty . f 
' Evidence of the difficulty of individual 
released items relative to the total set of 


. items in each form was not available. Item 


‘ ‘ N Y 
difficulty for various demographic strata were 


. + provided for each item. 2. 

$08 | | a 
— Appropriateness ok ) ', Bet 
1. Instructions ; ie 

a. Clarity re 

The vocabulary and syntax of the exercise TR: 


instructions are sufficiently simple and direct 


in most cases to be suitable to children in ; 


+ wen 


a 


Ce aid 


grades 4 to 9. In some cases, however, the 
instructions are somewhat confusing. | ‘ 
b. Purpose P 


5 _ Whe released exercises are not accompanied 


by a statement to examinees on the purpose and 


_¥ 


intended use of the exercises. 


° c. Comprehens iveness 


The exercise instructions are sufficiently 


comprehensive in describing the task require- 
ments to the examinee. Although no information . 
a on the relation of guessing and scoring is provided ;-~ ones mace 


a response category labeled, "I don't know" was 


provided for many exercises. 


a.: Samp le Items 


- Because the NAEP released exercises are not 


- + formatted into a test package, a sample item was . 
not offered. saan, 
a e : : - - + 7 Yt sae i = 
6 ak 2 7 Separate instructions accompany the stimulus 


materials for each exercise. Exercise instructions 
hi - would be. amenable to presentation in an oral. mode. 


2 e Items . ; > 
« 


meeowe 


a. Motivation 


_ OFeUE 03098 °X°H 


Most items do not appear sufficiently rele- 


« 
+ 


* . vant or interesting to inspire the intrinsic 


- 


motivation of children in the 4 to 8 grade range. 


b. Propriety 


_No invasion of privacy, sexist, racist, or 


3 im 


eit ili i: ei 


| Kepkz0ag/Suz peed 


ee aie offensive content was identified in 


the NAEP exercises. 


*3. #FPormat and Procedure 


a. Physical Quality 
a 


The NAEP exercises are not presented in a 


test package. Therefore, many factors pertaining 


to physical quality would be the responsibility 


ae of the secondary user. The quality of graphic ‘ 
representations, a factor inherent in the exercises, ; 


is good. 


. . ER OL shy LTR Shee ona 2 oe 
EPP ign 2, BENE Ma VEER ON SR 7 


3 3 * oh ne ~ . 
a oleh op ee case * Re Se a, aR CS gee sas - om” 

Sah ae ae Tee me Pie ote 
= . oy ce , ¢ 2 7 


; ants : - : 7 . ‘ a , ‘| 
ee? ‘be Layout j 
a The arrangement of items permits -ready 

recognition of separate items. However, the 


relation of the stimulus material to the. 


a Sa | 
<< 8k 


exercise question is sometimes inadequate for. 


clarity. 
‘» Cs. Timing 


Because the NAEP exercises are not formatted 


ed 
re . 
. 


into a test package, the secondary user of these 


Feed 9282S “X°N 


. 
. 
. 
9 eee + 
poessmabad 


exercises would determine the test time by select- 
ing a particular number of exercises for a given 


administration need. In their primary use, sets 


+ 


of exercises were used which required a 35 minute 


‘0 ema ~ 


‘a 
a 


test period. JS 
Response Mode 
As presently formatted, the NAEP exercises. 


are not amenable to machine scorfng.~ 


-~ 


a ae: OT e. Complexity 
: Each exercise requires a single, simple, 


and direct respohse. Many of the stimulus materials 
have been used as the basis for multiple items. \ 3 . 
. : “ > . 

Normative Standards “24 


secon tau altel Meee PITAL TAM LE AOR A ll ES : : 


1. Data Available 


o & 


The NAEP Study obtained data on each exercise for. 


a variety of demographic variables. The item difficulties 
are thereford available foe children aged 9) 43. 275 
“ and for wieita; as well as for race, geographic region, 
. . @te. , : | . ee oo 
st - . AS te 66 


~ 


c 
Es 

ee ee eee eee 
Ng 


: “". 2, Normative Sample > ‘i “e 2 . 

é aby" | 7 The available data on NAEP exercises was obtained 
. ‘for children aged 9, 13, and 17 years, and adults . 

3 . . 23 aged 26 to 35 years. The primary strata of. the x Bae hy 


aanpie (geographic region and community type and size) 


+ wee 


aia” not ‘systematically address the age and economic 
variables that are central te: the Title I study. 


3. Representative 
"he NAEP sample was not systematically stratified. 


‘oye 93028 °2°K 


RO oxy: The results obtained from a survey of 98, 016 people - 
” were examined on the basis of various group charac- 


cme. 4 teristics including: sex, Black or White race, parental 


\ education, geographic region, size and type of 
. community, and age. ; aaa 
4. Reporting | 7 
‘ : Laer ‘The data for the NAEP released exercises are 


reported separately for a variety of group character- 


istics. 


hee 5. Item-Statistics | . 
Item statistics are reported for both the whole 


sample and for the various group characteristics. 


enon Administration . 3 : anes 
ve es : . . | 

ao: 1. Personnel \ » 9 : | 

, The NAEP exercises are amenable to administration 4 


by nonexpert personnel. 


2. 


5. 


“ Scoring 


Scheduling . 
A set of NAEP exercises could be selected to. 


*“prdduce a 30 minute testing period. 


Setting : , . \ ~ Y x \ he 


‘The NAEP exercises are amenable to group admin-- 
istration in a usual classroom setting. 
Some of the NAEP items are machine scoreable, | 
while others require hand scoring. The reliability’ of 
the machine-scoring procedures is unknown. : a 
Materials 7 

The NAEP dxercises are entirely of the paper-and- 
pencil variéty. : . | 

The NAEP released exercises belong to the public 


domain. - 


4 


- Interpretation 


7 


l. Manuals 


4 The NAEP released exercises are not formatted 
into a test package. Consequently, there is no test 


manual. 


The meaning of a score on an NAEP extvcixe is 
devives purely from the examinee’s- ability to success~ 
fully cope with the content and demands of the exercise. 
Performance on NAEP items cannot be related to a hier- . 


archy of performance levels. ~ . 


7 


. - 2 - i 
‘ 
es 63 “ 


68 


_2. Meaning boss ” eT tee 


_ OFeeR 0202 *x°N 


~ 


Evaluation = 


_from the lack o 


ee ete 


Scales 
The primary exercise scores are not directly 


‘understandable. 


4. Implications a \ . 
The implications of performance on NAEP exer- 
cises is limited to the ability to perform any 
particular exercise. The implication is related to . 
the NAEP reading objectives to the extent that the 
exercises are valid derivatives of these objectives. 
‘the relationship of NAEP performance to educational 
policy has née been developed. 
The advantages of the NAEP exercises are that they have 
been used with children of the age group to be involved ix 
the Title I Study of Sustaining Effects, and item statistics 
are available for several examinee variables that are important 


for the Title I purposes (e.g., age, race, parent education, 


‘geographic region, and community size). Further, they could 


be formatted into a test package of appropriate length and | \ 
to meet other criteria of administration. 
In spite of the ‘positive qualities, Ene NAEP exercises 


¢ 
have several undesirabie features. The niet glaring rises 


utational subtest. ‘In addition, the 
stimulus materia TT judges to be intrinsically. 


motivating and pal pilot work with them bas revealed them > 
‘nn Se 


eo? 


as a8 ¢ 
ie 
Ce 
eee res 


che ee o 69° 


| OFSRE 0282S “2°N 


KepAxoag/Buypecd 


[ 


Soca. sufficiently difficult for children in the | 

4 to 8 age range as to generate some degree of ‘examinee 
resistance. The item devplopment process has the concepfual 
difficulty of having been created by experts, rather than Ss 


‘ 


peng flowed from observations of real-life experiences of 


“ 


- 


the *popaxacicn of potential examinees. 


“@ 


The possible utility of the NAEP exercises to the Title I 


o 


study is two-fold. First, an examination of the item statistics 


of certain tens judged to haya’ qualities of intrinsic motivation 


oF eem 03098 ° XN 


and‘ suitability to the Title I wiseceuene: purposes may result 

in the selection of particular items. Secondly, the NAEP items 
may have heuristic value to the development or modification of *y 
a functional literacy assessment tool. As they stand, however, 
the collection of items for 9 and 13 year olds do. nd in the 
reviewers opinion, represent an eepeeaela means of assessment ~ 
for. the Title I study. At a minimum, this edtieat son would 2 a5 


have, to be modified and supplemented extensively. 


. Si 


: 68 fee ah 
ee a. . eee Ns : Pigg soe 


ert NF Cup sgt! 
* 


Pn aS Ee ee Se TS Pe eee ONES SOE hg ae ON Te Seen Ie re ae =~ 
+ . 


' "New York State Basic Competency Test in Reading - 


é 


A Derivation of the Adult Functional Reading Study by Fducationcal 


Testing Service 


The University of the State of New York \ y 
The State Education Department ° ; 
Albany, New York: 12234 


General Description ~ 
tc The New York Basic Competency (NYBC) Test an Reading is 
available in an experimental form only. ae ae of 28 
different samples of reading materials, each accompanied 
by-.one, two, or three questions which are designed to measure 
Sr of students in grades 9 to 12. fhe total 

4 


score 0 is generated by 40 multiple choice questions, 


_—- 


J kepxrena/ouypeee 


On the basi OF @n arbitrary cutoff scére of 65%, a score ~ 
of 26 recommended as the minimum passing score. The 
test requires approximately 45 minutes, but thés time 
period has been implemented in a fléexibie anne in order 
that examinees may be allowed more time to complete the 
test if they need it. The test is group administered by 


- 


nonexpert personnel. 


Test Background 
1. Purpose : 
“The NYBC Tests are designed to provide\a 


° 


measure of the ur wi to which pupils, begin 


in grade 9, nave achieved a minimum level of 
mastery of the basic competencies -that will k 
required of them as adults. . The goal of the 
school is to éeaure that every pupil will have 


e reached such basic competency levels ‘before 


vider ’, 


aM es r pie ' : \ 
"gf 4 a 66 “1 ; 4, 


ae ay 2 ay Me ce a: Pio (tee Say = aS ab 


26 


t bs . 
leaving high school, whether as a graduate ora 


drop-out." iN | - ° 


Clarity of Purpose to Examiners 


No concrete explanation was given in the manual 


of “basic competencies," nor of the basis for 

decisions on what is required of adults. A review 

of the Adult Functional Reading Study performed 

by Educational Testing Service (ETS) would be 

necessary (Murphy, 1973) before these aspects of 

the test's purpose would be understood. 

Compatibility of Purpose and Test Construction 
“The New York Basic Competency Test in Reading 

ia divest’ dexivation of the instruments developed 

by ETS in the Adult Functional Reading Study 

(Murphy, 1973 - 1975). Thus, the two primary 

construction efforts made by ETS in that study 

ates apply to the NYBC Test in Reading; i.e, - 

1) the nation-wide survey of actual adult reading 

experiences, and 2) the large-scale field test 

of instruments constructed on the basis of the 

survey results. The purposes underlying these 

two efforts were to determine the nature of the 

actual reading experiences of American adults, 

and to develop a means of measuring an adult's 


ability to cope with these real-world literacy 


experiences. 


-e°0e @ 


buypeod 


~~ 2 


Kepkz0ag/ 


¥ 


od caine aan 2 a oc a ase 


* 


TRIS oe pr , ee 


e 


» = \ . 
} a 
The purpose of the NYBC Test in Reading is 


very similar to the measurement objective of 


; ‘ "the Adult Functional Literacy study; namely, to 
» measure the extent to which persofs have achieved 

a minimum level of mastery of the reading competen- 
cy expected of them as adults. The major difference 
is that the NYBC Test's purpose is targeted for a 
particular segment of people, secondary students 
in grade 9 onward. If the ETS tasks are, accepted 
as valid instances of the reading competencies 
that will be expected of these persons, then the 
purpose of the New York State Basic Competency 
Test in Reading is highly compatible with the 


“ manner in which it was constructed. 


4. Compatibility of Purpose with Item Sampling & 


@ 


eg The universe of items, from which the test 
items were sampled, were obtained from a stratified 
sample of American households. Approximately 100 
primary sampling units were generated. These set 
geographic region and size of community as the’ 
first and second order strata from which a repre- 
sentative sample of U. S. households were sampled. 
One person age 16 and over was selected for inter- 
viewing from each household by a predetermined 


selection table. No special provision was made 


‘= 


iC eo dite Kat coepillamentnd: memnetare oy Ye Sh shew 8 ~ i 
3 Awd wee” * 

Se : : ; 
| 
| 
' 


for sampling items in a manner which would accommo- 


date differencés between ethnic, racial, or varying 


income groups. 


Psychometric Quality 


\ ss 


- - . 


“de Validity 


a. 


Content Validity - The NYBC Test in Reading is 
defined as a measure of the minimum level of 
mastery of basic competencies required of adults. 
This definition may reasonably be interpreted 

as a measure of functional literacy. 

The stimulus materials, by virtue of the 
method by which they were generated in the ETS 
study, represent those commonly encountered in 
real-life reading tasks of persons in grades 


9 upward. However, the relevance of the materials 


“Kepkzeag/burpeca 


to the experiences of economically or educationally 


disadvantaged pergons is unknown. It is known 
that they were not sampled from children in / 
grades 4 to 8, and therefore do not represent 
the materials encountered in real life by 
children of the type involved in the Title I 
Study of Sustaining Effects. The behavior 
domain is not matched to tasks and skills 


required of children between the ages of 9-14, 


69 


wae 


74° 


In selecting items for the test's target group 


of 9th to 12th grade students, -bkill and Vv 
didricubte Level were considered by the staff 
members of the Bureau of Reading of the New 

York State pepasemenk of Education. No explicit 
criteria for reviewing items on this basis weré 
reported. Apparently, expert judgement was 

used. However, the second field testing was 
limited to a Sample of students in which 40% fell 
below the Statewide Reference Point on the 

PEP test.* . 

Although the. symbolic domain is fairly 
appropriate with respect to pictorial represen- 
tations and degree of techncial language, the 
general vocabulary level appears to be too high 
for children in grades 4 ‘6 8. The content is. 
the most inappropriate aspect of the items. 
Benekie of task performance was specified as 
" one of the oxiterts used to select ‘cams for 
the test. This procedure is not described. 

In the ETS study, a socio-benefit rating was 


obtained on all items from an advisory panel. 


*The Statewide Reference Point is the. 
cutoff point between the third and fourth 
Stanine, with 23% below and 77% above. 


re 
2 
j 
i 
a 
i: 


= okey 


i . ; 
is unknown whether or not the ETS benefit 
ratings were utilized in the selection ‘of items 
'” €or the NYBC Test in Reading. 
In the judgement of Pacific Consultants, eared 
the 40 items in Form L are of relatively high 
secioscononic benefit. They do suffer however, 


from the omission of some areas which could be 


of greater value to the grade 4 to 8 children 


of the Title I Study of Sustaining Effects. 


They are also weakened by the somewhat contrived 


nature of some items. 


“ 


As desired, the materials are not referenced 
to specific prpgram objectives. In terms of 
criterial objectives, the tasks of the NYBC Test 
in Reading benefit from the rationale underlying 

ei the development of instruments in the Adult 


Functional Reading Study. Consequently, the 


W. “Xepkzeag/buypeee 


definition of functional literacy is based on 
a large scale sample of the actual reading tasks 
, of adults, and the test items are versions of 
these tasis. Thus, the link between definition 
ana ditems is clear. The one reservation is 
that the ETS items were based upon a survey of 
persons aged 16 and older, while the NYBC Test 
is used with persons approximately 14 to 17 


. years old. These two age groups are in contrast 


yal 


to the age range of 9 to 14 years to be addressed 
i PCAy 


a 


in the Title I Study of Sustaining Effécts. Thus, 


’ to the degree to which it is reasonable to 


“Empirical Validity - Results are reported on 


assume that the difference in age group modifies ... 


the definition of functional literacy, the link Bhs 


. between definition and the test items becomes 


weakened. | - 


the relationship of this test with the "Ninth 
grade PEP tests in reading and mathematics." 
No description or eration of the PEP tests 
is exoudded. 


Discussion of results indicated that 


". . . the Basic Competency Tests and the PEP 


tests have a considerable overlap of function 


a#er ew . oa . 
KFonk taaa /fAurnrean 


in both areas, more noticeably in mathematics. 
Still, the correlations are well below the 
level required to consider the Basic Competency 
Tests and the PEP tests ‘parallel'." 

Approximately 5% of the students who had 
PEP scores above the Statewide Reference Point 
failed the NYBC Test in Reading. Slightly 
more than half of the students obtaining PEP : 
test scores below the Statewide Reference Point 


were able to pass the NYBC Tests. No information 


7 VA 


77 


4 


” 92 


was available on the other aspects of empirical 


validity. 

Construct Validity - The test is not based 
directly upon the theoretical Sonetnices ob 
*psychology, education, and psycholinguistics; 


nor upon educational practice and policy. 


2. Reliability 


d. 


of 


Comparability - In one report on the test its 
relation to the PEP test was reported. This 
report also eluded to parallel forms. Only 


Form L was made available to Pacific Consultants. 


No information was available on the construction > 


of parallel forms. 

Stability - No information on test-retest 
correlations were available. 

Internal Consistency - The math test is not 
available to Pacific Consultants. Neither are 
data relevant to the relationship between the 


math and reading scales of the test available. 


Standard Error of Measurement - not available. 


a 


3. Test-Item Structure 


* Item Construction - This process was not clearly 


described. Aside from the knowledge that the 


items are derived from the original pool estab- 


_ lished in the Es Adult Functional Reading Study, 


nothing is known about how the items were selected 


73 78 


‘os 


kepAzoag/butpeeu - 


ase 


a? 


for the various forms of the NYBC Test in Reading. 


Item Selection - No information is available. - 


Item Difficulty - Statistics calculated on item 


° 


ae ; ; \ Y VN. 
difficulty showed that ‘the difficulty level for 


the items ranged from .35 to .97, and that an 


‘ average difficulty of .80 was obtained for the 


eight preliminary forms of the test. Table 1 


below shows the number and percent of items 


- for each difficulty level. 


Table 1. Number and Percent of Items at. 


Successive Levels of Difficulty. he. 
"Bi 
eer :f 
Level of Number Percent of “3 
Difficulty of Items Total Items g 
OE Sa a ee ere ee ee ere ee eo =) 
. 90+ 48 41 re 
‘a 
- . we 
- 80 .89 26 22 = 
70 - .79 14 12 ‘i 
-60 - .69 12 10 s 
«50 = 459 8 \ 7 : 
40 - .49 4 
630 =1,39 3 3 
Pe | 
74 79 


* 


. . 
. 
a. 


Appropriateness 


1. Instructions : -* ° 


Clarity - The vocabulary used was somewhat too 


sophisticated for children in grades 4 to 8, | ,, 
- * F > 


The requirement that the student read the item ‘ 
‘instructions contradicts some of the fundamental 
assumptions about the need to measure minimal © 
eompatenay in reading. 

Purpose - The purpose of the test was explained 
well if the student can read the directions on 


page 2 of the test booklet. 


Comprehensiveness - The comprehensiveness of r 

instructions is satisfactory if the student can A: 

read them. 3 

‘Sample Items - The sample item accurately a 

illustrates the task requirements and the level - 3 
) 


of task difficulty. These benefits may not 


apply however, if the narrative surrounding 
the sample item cannot be read. : 


Mode - The instructions must be read by the . 


: ? 
student. This is a major drawback as the reading 
skills that are needed exceed those necessary to 


respond to many test items. ' 


75 50 i / 


2. Items 


ees 


Motivation - Because the content of thé items 


is relevant only to adults, and not to children, 


- 


poor intrinsic motivation should be expected 
for 4th to 8th graders. The contrived nature 
‘of some items may also preclude the high 
intrinsic motivation of older students as well. 
Propriety - No instance of invasion of privacy 
was found in the items. Deliberate attempts 
to utilize non-sexist, non-racist iameuaua 


and content were apparent. 


3. Format and Procedures 


‘Physical Quality - The paper, print, and pic- 


torial representations were of reasonably good 
quality. 

Layout - The test layout clearly separated the 
items from one another. 

Timing - Test length was developed with timing 
in mind (approximately 1 class period of 45 
eee a but open-ended timing was recommended. 
These proposed lengths are both in excess of 
the 20 to 30 minute length desired for the 
Title I Study of Sustaining Effects. 

Response Mode - Five types of answer sheets 
are available for each test, four of which 


are intended solely for machine scoring, and 


76 


7 "es, 


“Kepkxeng/buzpeed 


a 


en ee 8 eee 8 


one for either machine or hand scoring. 


; e. Complexity - Several items are based on some of 


° . ' the test materials. However, each item requires 


only one simple and direct resbonse. x i 


Normative Standards 


No normative data on children in grades 4 to 8 are avail- 
able. Neither the sampling procedures for item generation, 
nor those for field testing attempted to obtain a sample 
representative of racial, ethnic, or socioeconomic strata. 
Geacraphic and community size were the primary strata for 


the item generation task. Field testing was performed in New 


Hut peed 


York State communities of varying size. 


Administration 


The NYBC Test in Reading can be administered by non- 


expert personnel. Test timing could easily be modified for 


Aepk10ag/ 


“ste 7~ 


a 30 minute period. The test is suitable for administration 
: ¢ 


‘a 


in a usual intact classroom setting. Scoring is by machine. 
The reliability of the machine scoring process required is 
unknown. 

4 The test materials are entirely of the paper-and-pencil 
variety. The test cannot be purchased from ETS. The New York 
State Department of Education considers it an experimental 


a 
< : ; 
instrument which is not yet ready for dissemination. 


Interpretation ; 


1. Manuals - The test manual is very brief and does 


“ not conform to APA standards for test manuals. 


« 


go pF 


‘ ge Oy 


2. Meaning - The meaning of a .score'on the NYBC Test 


Ni _ ¢ 


fin Reading is Got Class «. <- — Yt oS 


3. ‘Scales - The primary test scores are not directly 
understandable. . meh oe 4 . : age, 

| : ers c,, 2 

4. Implications - The implications of the- ‘test, score Se, 

8 é = sn 


is that the, examinee can or cannot perform certain: 


iat real-world tasks, ‘The relationship of this «. 
implication to’ educational policy has‘ not been fully 


r 


developed. bey , 

‘ a 
, The lack of relevance of the items of the New ‘York State | Py 
Basic Competency Test to, children having, the age and socio- ’ a 
economic attributes of those in the Title I study; presents z § 
7 a serious drawback for the costly consideration. Further, . " se 
_the test does not contain a Sonipikakional section. The ae , Ye ee 

“a 


issue of item suitability is so problematic that it- renders . 


the| modification of this instrument impractical. 


PY 


» my 
we . 
' ie 
~ . J . ‘ 
z , 
e a — ia 
, wp mos - 
. ° *. 
a ba 
° f P 2 
Hg: . 
a 
* 
. 1 ; a teal 
\ 
ee ~~, 
. &3 Cc Pi * 
vu . 
- ; 1 78 z > s 


‘Reading/Everyday Activities in Life’ (R/EAL) 


Virginia Polytechnic Institute Be 


: Description 


_, also ‘states that "...all indications are that it should 


per selection, are as follow: 


Marilyn Lichtman, Ed.D. 


Cal Press Inc. 
76 Madison Avenue a 
New York, Néw York 10016 - \ R 


. R/EAL is a test of ‘reading, divided into nine epadang 
"selections" , each of which is claimed to represent. a ° 
‘general. cxbenaey of reading, ".. often encountered by 
individuals of high school age or above." The manual 
be useful with anyone age ten or‘ older." 


’ The nine reading selections, with five questions 


1. A set of road signs. 
2 A T.V. schedule. 


3. A set of directions for preparation ot 
9 cheese Pizza. 


4, “A reading selection on the ae a of narcotic 
: drugs .* 


5. a food market ad. 


6... An apartment lease. 


\- 9." -A road map.- , y 


8. “A Want, Ad. 


9 A Job application. a 


~The bet is administered by means of individually SS’ 


7 


"operated se players and earphones; ss aliows the 


o 


“yo 
ae 


test to be self-administered, self-directed and self- 


¢ paced. 


PYS 


Group administration is arse pest ete by having 


e 


“audio Santpment for’ each student. "Recently a bicca dale 


_ Of instructions has been ‘made available which would. oe 


: . X 


. eLiminate the necessity of audio equipment. a” eax 


° 


‘ Pest Background . i. © ) 


be 


ww 


Oe 


Purpose ; 
.The manual states that, "the -R/EAL should 


be used tO assess whether or not an individual 


a functionally literate." 


ms Psychometric Quality . 
” pee 


It_ claims a suitability for minorities, | 
Blacks, Puerto ‘Ricans, Mexicans, and others, 
who have Beep singled out by the. bias of the 
traditional’ standardized tests. 


_ AS an. evaluation tool, R/EAL claimé to 


Bice a determination of progress made by gtudenta. 


at can also be used to meciaee what extent. 


_. students in a ‘given proeer have basic literacy - 


skills. ‘It. eantok be determined from the ‘manual 


Ao : 


whether test construction follows it's intended 


i e: ’ 
purpose. ..__, : 


Se 


ValidMty 


The R/EAL alleges to measure, on an individual | 


\ basis} responses to questions that are easily 


identifiable with the examiners' every day life. 


tog 


The correlation between the R/EAL and the Stanford 
Achievement Test is .74. Content validity is 
based upon the generation of items from a task 


analysis used in the Gebiniecen of test objectives. 
* : » ; x BY 


Reliability oi Aes 


r ; ‘ ; 
Usingra minority sample of persoris with an ~ N. 
average of 5th grade reading achievement scores, CS 


internal Consistency estimate of .93 was made 


with the Kuder - Richardson Formula 20. 


Test-It¢m Structure 

The litems in the -R/EAL were identified 
only as reading tasks that one could reasonably 
expect to encounter in his/her every day encounters. 
A review of the items weenie that some of them 
would be inappropriately difficult for the low - 


end of the intended sample (e.g., items based on 


a facsimili of an apartment lease). 


; Appropriateness 
~ he 


Instructions | 

Recorded instructions were sufficiently clear 
and an explanation of the purpose was provided 
in the manual. The R/EAL provided two sample 
questions in the test booklet. ’ 

The recorded administration procedures which. 


permitted self-pacing and self-administration 


prs, 6 6. 


a 81 a 


—* 


4 to 8 makes this a questionable advantage to 


without having negative affects on the examinee. 


may have a positive effect on examinee motivation. 


The use of facsimilies of. real-life literacy 
experiences may have also enhanced the motivation 
of adult examinees. The irrelevance of these - 


~ p X\ . YY 
materials to the lives of children in grades 


the present study. The items were not judged - 
by the reviewers. to have offensive content. 
Format and Procedures ‘ 

The test BeGKEe was of generally good 
physical quality, and the illustrations, and 


graphics were realistic. The printed layout 


occ 


of the R/EAL was adequate. However, since 
‘ . 5 
the sequencing was heavily dependent upon the tape 
recorded instructions, mechanical problems | 
could seriously disrupt the necessary link bet- 
ween instructions and stimulus materials. 
The R/EAL has made provisions in its 


design for the differences in the speed with 


which a student can finish the test. It also 


is taped in such a way that individuals and 


groups can start and finish at different times, 


There is no designated amount of time, 


* given to the students. to finish the test. 


- 


The R/EAL's test booklet is arranged in 


such a way, that students can write directly \ 
- in tha test booklet, but it is also designed 
to be hand scored, using a key provided in oe 


manual. 


Normative Standards | } 
Normative data is not available. The manual for 

the R/EAL suggested that the test was highly suitable 

for minorities, Blacks, Mexicans, Rural groups, and all of. 

those shown bias by the standardized test. However, 


NO EVIDENCE is presented to substantiate this claim. 


Item statistics were not presented. 


Administration 

The R/EAL may be administered in a classroom setting . 
by reading a capuabne script of instructions. It can 
be aaninteeeced by mon-expert personnel. The test requires 
approximately 20 to 30 minutes. Hand scoring is 


required. 


Interpretatiog 


1. Manuals 
The manual for the R/EAL was fairly elaborate 
and clear in the areas that it covered, but 
obviously missing was norming, or any evidence 


for purposes of test interpretation. 


‘ 


83 §§ . 


2. Meaning 
The only clue to score interpretation is 
the statement that 80% is "passing". There 


is no information as to how tg percentage was 
. 1. fate a x 


~~ - deduced. 


3&4. Scales and Implications 
‘Beyond the "passing" scoré level, no infor- 


mation is available to make a judgement on, 
. @ : 


these criteria. 
‘ : > mi » 
Evaluation ‘ 
The main strength of this test is the real-life 
characteristics of its item formats. Beyond this, little 


can be said to recommend it for use in the Title I Study. ' 


There is no evidence as to whether the items can bé€ used 

with children in grades er 8. en ons 

“is available. The usé of cassettes or tapes makes it 
unusable for present purposes, although the newly prepared 
instruction script could alleviate this drawback. A 


passing grade of 80% is the only normative data presented, 


and is not related to other kinds of real-world performance. 


i 


84 


J Sp 
7 } | 


IMPLICATIONS AND RECOMMENDATIONS 


Se 


AS a result of the literature review, and the test 
evaluation activities of the functional literacy project, . 
cad. gies ioe ae 
_a@ series of conclusions and recommendations have become 


rc. 


apparent. . 7 


‘Available Tests Judged by the Review Criteria 
" » 


None of the tests reviewed satisfactorily matched all 


x® 


criteria. The inadequacies of the tests for purposes of 

the Title I study are basic. No test could be found that 

is appropriate to the 4th to 8th grade age group. The tests 
were constructed either for adults, young adults, or pupils 
from grade 6 to grade 12. All six of the tests contain 
varyind amounts of materials that are commonly encountered 
in real life reading, with test items constructed from 

these materials. However, none of the materials were ob-. 
“tained by actual observation of the behavior of children. 


Instead, an "expert" judgement was made as to the mater- 


ijals that people would have to read in order to function 


in society. The problem with this approach is that 

‘this kind of "expert" judgement can defensibly be made for 
adults (i.e., fone adults have to pass a test to get a 
driver's aicense), but it does not apply to young children 


aged 9-14 years. a 
' Two of the tests; the Adult Performance Level 


Test (APL), and the Fundamental Achievement Series (FAS), 


measured both. reading and computational skills. 


tor 8s 90 


| 
| 
i 
\ 


They cannot be used for the Title I wendy without modification 

" because the former was designed For adults enki and the 
latter for grades 6 and above. The FAS could probably be- 

_modified at less expense, since it seincraxecethe : an 
higher end-of the intended sample; i.e., grades 6 and 8. 
Instructions however, are tape recorded, and would therefore 
require modification. Although there is no evidence that\ the 
test items in either the FAS or APL a built around actually-. 
observed, real-life reading behaviors, the APL items are accu- 
‘sate facsimilies of literacy stimuli commonly encountered by 
adults. Various other important criteria ware Uncking in 
these tests, but since the wet basic criteria were not aia 
it is a moot point to discuss additional inadequacies. De- r 
tailed descriptions of these Avtieets are included in the 
test reviews. . 

: The remaining tests, which measured reading skills only, 7 

have various advantages and disadvantages. The National . 

: Assessment of Educational Progress in Reading (NAEP) had the 

most information on each item. Although individual items 

rather than an administration-ready test package are actually 

available, if the construction of a test ie envisioned or 

decided upon, then serious consideration should be given 

to some of the items presented. Item statistics are available 

by sex, race, geographic region, size and type oF eounanltns 


and age. 


a 


86 oe 


weg 


\ 
| 
s \ 


The favorable aspect of the Basic Skills Reading Mase 


. tery (BSRM) Test is that it has a test form for children 


12 years of age. It therefore was constructed for children 


‘ who match at least a portion of the Titlé I age range. 


The lack of realistic facsimilies ag item materials 
ere pee its value to the aidsdvantaqed population 

of children to be studied. Rather than having been de- 
veloped from observations of. the aeuad reading experi- 
ences of children, the BSRM materials were based upon : 
expert judgement. Modification of this test would re- 
quire that a set of easier items be added for 9 to 11 year 
Qld children and that the representativeness of stimulus 
materials be improved. Further, test instructiorms would 
have to be converted to an oral presentation mode. 

The Reading Everyday Activities in Life (R/EAL) has 
the important feature of presenting its items as actual 
pietoaesenay or true-to-life drawings of the objects that 
contain the reading matter. This is an extremely desirable 
characteristic of item presentation, especially where 


minority groups are to be tested. Unfortunately, most of 


the items were constructed for an intended population of 


high school, graduates and older. Additionally, there is Be 
no evidence that actual reading behavior was used as the 


basis for item construction. Modification of the R/EAL 


‘for use in the Title I Study would only succeed if 


on items were selected which are judged to have 


fevance to 4th to 8th grade children, and then supple- 


87 


Je ° 


mented with other easy and appropriate tasks. The 


- 


instruction script and other details of administration 


would require minor modifications in order to make them _ 


entirely. compatible with. the testing purposes of the * 


Title I Study. ‘ 


- Possible Courses of Action 


Since no test actually exists which meets the needs 
of the Title I Study, two courses of action 
other than test selection, are possible. 
| 1. The most idealistic solution would be to develop 

a test from the beginning. This would entail 
sampling the actual reading behaviors of children 
who sabe the Title I age and demographic 
characteristics. A becnnlont iy sound item- 
building phase would then be required, with 


careful pretesting of the final instrument. The 


advantage of this option is that the final instru-_ 


ment would be a high-quality test which would be 
suitable for a wide range of future applications. 
unaevee, it would be most responsive to concerns 
regarding the testing of disadvantaged children; 
The disadvantage lies in the time and resources 
required, since it would be the option requiring 
the longest time-table. The work could not be 
completed within the present scope of work, nor 
by the December 1, 1975 deadline for subnttéiae 4 


an instrument package to OMB for clearance. 


88 
J2 
ee 


~» 


;: A 
-_—+-— 


has the capability required to implement this 


several instruments. In either case, some set 
- ‘ ad 


Should accommodations be made for the time ; 


and level of effort required, Pacific Consultants 


option. , gobs ee Pa 


- + 


The second option open to the Title I Study's ‘ 
assessment of functiorfal literacy is the construc- 
tion of a-test from multiple sources. “This: ‘ ‘ 
procedure would entail the combination of a com- 
putational section with either a partially , 
suitable test of reading competency, or with a 


reading section constructed of items drawn from 


of easy items would have to be added to both the , 
reading and computation nortdons of the assess- 
Ment tool. These easy items could either be new 
creations, or more likely, modifications of items 
extant in sone of the items available from the 
os tests reviewed. | 

As an example of this second option, the 
BSRM, published by the State of Maryland, -may 
be modified so as to include items appropriate \ 


for younger children. Then, the computational 


—-=. 


: 4 
items of the FAS could be used to construct 
a computational subtest. Alternatively, items 


from the R/EAL, or items from the NAEP could 


{ 
“| 
[ 
{ 
| 
’ 


~ 


be modified or selected’ for use as reading 


a a as - ——— a 
« . ? U 
3 yi ; a ’ E ‘ 
pe es ; ad | 
- . 


| 
tasks, and used in combination with the com- 


? Ss ; putational items from the FAS. Another version 


of this sption would consist of constructing a 
cmoet ae test from all possible sources sampling. and, ‘ 
building items by the use of test construction 
7 , "experts. “ Any test produced by these methods 
would then have to be pretested, and naditied } 
at least once before it would be ready for a | 
field pre-test. Pacific Consultants has the 
. staff and technical capability to carry out this 
option within the current time-table and level 
‘of effort. Although this approach precludes 
the advantages of generating literacy take 
- from the actual experiences of disadvantaged | 4 
4th to 8th grade philaress it does maintain 
% the qualities of a criterion-referenced 
approach to instrument construction. In so doing, 
it would supply. the Title I Study of Sustaining 
Effects with a suitable instrument to comp1li- 
ment the norm-referenced standardized tests 


of achievement. 


Ris 
—— 


90 


REFERENCES 


r ’ 


Coe a 


American Psychological Association. Standards for educational 
and psychological tests. Washington, D.C.: Prepared hy ; 
aS otnt committee of American Psychological Association,. : 
American Educational Research Association, National Council ->«. 


on Measurement in Education, Frederic B. ‘Davis, Chair, 1974. 


Center. for the Study of Evaluation, U.C.L.A. CSE Elementary | 
School Test Evaluations. Los Angeles: Ralph Hoepfner and 
The Staff of the School Evaluation Project: (Guy Strickland, 
Gretchen Stangel, Patrice Jansen, Marianne Patalino), 1970. 


Murphy, Richard, T. Adult Functional Readin Study, Final Report. 1 
Princeton, New Jersey: Educational Test Service, 1973. a, 


Murphy, Richard T. ‘Adult Functional Réading Study, Sup lement. 
Princeton, New Jersey: Educational Test Service, ch ‘ 


Northwest Regional Educational Laboratory. Tests.of functional 

adult literacy: an evaluation of currently available instru- 
ments. Portland, Oregon: Nafziger, Dean H., Thompson, R. 
Brent, Hiscox; Michael D., and Owen, Thomas R., 1975. 


Pacific Training and Technical Assistance Corporation. ‘Evaluation 


of the Community Based Right to Read Program. Berkeley, 
California, 4. . 


